Sophie

Sophie

distrib > Mageia > 6 > x86_64 > media > core-updates > by-pkgid > d5ca09083fa1e0650b386d1b93516003 > files > 538

python-lxml-docs-4.2.5-1.mga6.noarch.rpm

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.12: http://docutils.sourceforge.net/" />
<meta name="version" content="S5 1.1" />
<title>Implementing XML languages with lxml</title>
<style type="text/css">

/*
:Author: David Goodger (goodger@python.org)
:Id: $Id: html4css1.css 7614 2013-02-21 15:55:51Z milde $
:Copyright: This stylesheet has been placed in the public domain.

Default cascading style sheet for the HTML output of Docutils.

See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to
customize this style sheet.
*/

/* used to remove borders from tables and images */
.borderless, table.borderless td, table.borderless th {
  border: 0 }

table.borderless td, table.borderless th {
  /* Override padding for "table.docutils td" with "! important".
     The right padding separates the table cells. */
  padding: 0 0.5em 0 0 ! important }

.first {
  /* Override more specific margin styles with "! important". */
  margin-top: 0 ! important }

.last, .with-subtitle {
  margin-bottom: 0 ! important }

.hidden {
  display: none }

a.toc-backref {
  text-decoration: none ;
  color: black }

blockquote.epigraph {
  margin: 2em 5em ; }

dl.docutils dd {
  margin-bottom: 0.5em }

object[type="image/svg+xml"], object[type="application/x-shockwave-flash"] {
  overflow: hidden;
}

/* Uncomment (and remove this text!) to get bold-faced definition list terms
dl.docutils dt {
  font-weight: bold }
*/

div.abstract {
  margin: 2em 5em }

div.abstract p.topic-title {
  font-weight: bold ;
  text-align: center }

div.admonition, div.attention, div.caution, div.danger, div.error,
div.hint, div.important, div.note, div.tip, div.warning {
  margin: 2em ;
  border: medium outset ;
  padding: 1em }

div.admonition p.admonition-title, div.hint p.admonition-title,
div.important p.admonition-title, div.note p.admonition-title,
div.tip p.admonition-title {
  font-weight: bold ;
  font-family: sans-serif }

div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title, .code .error {
  color: red ;
  font-weight: bold ;
  font-family: sans-serif }

/* Uncomment (and remove this text!) to get reduced vertical space in
   compound paragraphs.
div.compound .compound-first, div.compound .compound-middle {
  margin-bottom: 0.5em }

div.compound .compound-last, div.compound .compound-middle {
  margin-top: 0.5em }
*/

div.dedication {
  margin: 2em 5em ;
  text-align: center ;
  font-style: italic }

div.dedication p.topic-title {
  font-weight: bold ;
  font-style: normal }

div.figure {
  margin-left: 2em ;
  margin-right: 2em }

div.footer, div.header {
  clear: both;
  font-size: smaller }

div.line-block {
  display: block ;
  margin-top: 1em ;
  margin-bottom: 1em }

div.line-block div.line-block {
  margin-top: 0 ;
  margin-bottom: 0 ;
  margin-left: 1.5em }

div.sidebar {
  margin: 0 0 0.5em 1em ;
  border: medium outset ;
  padding: 1em ;
  background-color: #ffffee ;
  width: 40% ;
  float: right ;
  clear: right }

div.sidebar p.rubric {
  font-family: sans-serif ;
  font-size: medium }

div.system-messages {
  margin: 5em }

div.system-messages h1 {
  color: red }

div.system-message {
  border: medium outset ;
  padding: 1em }

div.system-message p.system-message-title {
  color: red ;
  font-weight: bold }

div.topic {
  margin: 2em }

h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
  margin-top: 0.4em }

h1.title {
  text-align: center }

h2.subtitle {
  text-align: center }

hr.docutils {
  width: 75% }

img.align-left, .figure.align-left, object.align-left {
  clear: left ;
  float: left ;
  margin-right: 1em }

img.align-right, .figure.align-right, object.align-right {
  clear: right ;
  float: right ;
  margin-left: 1em }

img.align-center, .figure.align-center, object.align-center {
  display: block;
  margin-left: auto;
  margin-right: auto;
}

.align-left {
  text-align: left }

.align-center {
  clear: both ;
  text-align: center }

.align-right {
  text-align: right }

/* reset inner alignment in figures */
div.align-right {
  text-align: inherit }

/* div.align-center * { */
/*   text-align: left } */

ol.simple, ul.simple {
  margin-bottom: 1em }

ol.arabic {
  list-style: decimal }

ol.loweralpha {
  list-style: lower-alpha }

ol.upperalpha {
  list-style: upper-alpha }

ol.lowerroman {
  list-style: lower-roman }

ol.upperroman {
  list-style: upper-roman }

p.attribution {
  text-align: right ;
  margin-left: 50% }

p.caption {
  font-style: italic }

p.credits {
  font-style: italic ;
  font-size: smaller }

p.label {
  white-space: nowrap }

p.rubric {
  font-weight: bold ;
  font-size: larger ;
  color: maroon ;
  text-align: center }

p.sidebar-title {
  font-family: sans-serif ;
  font-weight: bold ;
  font-size: larger }

p.sidebar-subtitle {
  font-family: sans-serif ;
  font-weight: bold }

p.topic-title {
  font-weight: bold }

pre.address {
  margin-bottom: 0 ;
  margin-top: 0 ;
  font: inherit }

pre.literal-block, pre.doctest-block, pre.math, pre.code {
  margin-left: 2em ;
  margin-right: 2em }

pre.code .ln { color: grey; } /* line numbers */
pre.code, code { background-color: #eeeeee }
pre.code .comment, code .comment { color: #5C6576 }
pre.code .keyword, code .keyword { color: #3B0D06; font-weight: bold }
pre.code .literal.string, code .literal.string { color: #0C5404 }
pre.code .name.builtin, code .name.builtin { color: #352B84 }
pre.code .deleted, code .deleted { background-color: #DEB0A1}
pre.code .inserted, code .inserted { background-color: #A3D289}

span.classifier {
  font-family: sans-serif ;
  font-style: oblique }

span.classifier-delimiter {
  font-family: sans-serif ;
  font-weight: bold }

span.interpreted {
  font-family: sans-serif }

span.option {
  white-space: nowrap }

span.pre {
  white-space: pre }

span.problematic {
  color: red }

span.section-subtitle {
  /* font-size relative to parent (h1..h6 element) */
  font-size: 80% }

table.citation {
  border-left: solid 1px gray;
  margin-left: 1px }

table.docinfo {
  margin: 2em 4em }

table.docutils {
  margin-top: 0.5em ;
  margin-bottom: 0.5em }

table.footnote {
  border-left: solid 1px black;
  margin-left: 1px }

table.docutils td, table.docutils th,
table.docinfo td, table.docinfo th {
  padding-left: 0.5em ;
  padding-right: 0.5em ;
  vertical-align: top }

table.docutils th.field-name, table.docinfo th.docinfo-name {
  font-weight: bold ;
  text-align: left ;
  white-space: nowrap ;
  padding-left: 0 }

/* "booktabs" style (no vertical lines) */
table.docutils.booktabs {
  border: 0px;
  border-top: 2px solid;
  border-bottom: 2px solid;
  border-collapse: collapse;
}
table.docutils.booktabs * {
  border: 0px;
}
table.docutils.booktabs th {
  border-bottom: thin solid;
  text-align: left;
}

h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
  font-size: 100% }

ul.auto-toc {
  list-style-type: none }

</style>
<!-- configuration parameters -->
<meta name="defaultView" content="slideshow" />
<meta name="controlVis" content="hidden" />
<!-- style sheet links -->
<script src="ui/default/slides.js" type="text/javascript"></script>
<link rel="stylesheet" href="ui/default/slides.css"
      type="text/css" media="projection" id="slideProj" />
<link rel="stylesheet" href="ui/default/outline.css"
      type="text/css" media="screen" id="outlineStyle" />
<link rel="stylesheet" href="ui/default/print.css"
      type="text/css" media="print" id="slidePrint" />
<link rel="stylesheet" href="ui/default/opera.css"
      type="text/css" media="projection" id="operaFix" />
</head>
<body>
<div class="layout">
<div id="controls"></div>
<div id="currentSlide"></div>
<div id="header">

</div>
<div id="footer">
<h1>Implementing XML languages with lxml</h1>
<h2>Dr. Stefan Behnel, EuroPython 2008, Vilnius/Lietuva</h2>
</div>
</div>
<div class="presentation">
<div class="slide" id="slide0">
<h1 class="title">Implementing XML languages with lxml</h1>
<h2 class="subtitle" id="dr-stefan-behnel">Dr. Stefan Behnel</h2>

<p class="center"><a class="reference external" href="http://codespeak.net/lxml/">http://codespeak.net/lxml/</a></p>
<p class="center"><a class="reference external" href="mailto:lxml-dev&#64;codespeak.net">lxml-dev&#64;codespeak.net</a></p>
<img alt="tagpython.png" class="center" src="tagpython.png" />
<!-- Definitions of interpreted text roles (classes) for S5/HTML data. -->
<!-- This data file has been placed in the public domain. -->
<!-- Colours
======= -->
<!-- Text Sizes
========== -->
<!-- Display in Slides (Presentation Mode) Only
========================================== -->
<!-- Display in Outline Mode Only
============================ -->
<!-- Display in Print Only
===================== -->
<!-- Display in Handout Mode Only
============================ -->
<!-- Incremental Display
=================== -->

</div>
<div class="slide" id="what-is-an-xml-language">
<h1>What is an »XML language«?</h1>
<ul class="simple">
<li>a language in XML notation</li>
<li>aka »XML dialect«<ul>
<li>except that it's not a dialect</li>
</ul>
</li>
<li>Examples:<ul>
<li>XML Schema</li>
<li>Atom/RSS</li>
<li>(X)HTML</li>
<li>Open Document Format</li>
<li>SOAP</li>
<li>... add your own one here</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="popular-mistakes-to-avoid-1">
<h1>Popular mistakes to avoid (1)</h1>
<p>&quot;That's easy, I can use regular expressions!&quot;</p>
<p class="incremental center">No, you can't.</p>
</div>
<div class="slide" id="popular-mistakes-to-avoid-2">
<h1>Popular mistakes to avoid (2)</h1>
<p>&quot;This is tree data, I'll take the DOM!&quot;</p>
</div>
<div class="slide" id="id1">
<h1>Popular mistakes to avoid (2)</h1>
<p>&quot;This is tree data, I'll take the DOM!&quot;</p>
<ul class="simple">
<li>DOM is ubiquitous, but it's as complicated as Java</li>
<li>uglify your application with tons of DOM code to<ul>
<li>walk over non-element nodes to find the data you need</li>
<li>convert text content to other data types</li>
<li>modify the XML tree in memory</li>
</ul>
</li>
</ul>
<p>=&gt; write verbose, redundant, hard-to-maintain code</p>
</div>
<div class="slide" id="popular-mistakes-to-avoid-3">
<h1>Popular mistakes to avoid (3)</h1>
<p>&quot;SAX is <em>so</em> fast and consumes <em>no</em> memory!&quot;</p>
</div>
<div class="slide" id="id2">
<h1>Popular mistakes to avoid (3)</h1>
<p>&quot;SAX is <em>so</em> fast and consumes <em>no</em> memory!&quot;</p>
<ul class="simple">
<li>but <em>writing</em> SAX code is <em>not</em> fast!</li>
<li>write error-prone, state-keeping SAX code to<ul>
<li>figure out where you are</li>
<li>find the sections you need</li>
<li>convert text content to other data types</li>
<li>copy the XML data into custom data classes</li>
<li>... and don't forget the way back into XML!</li>
</ul>
</li>
</ul>
<p>=&gt; write confusing state-machine code</p>
<p>=&gt; debugging into existence</p>
</div>
<div class="slide" id="working-with-xml">
<h1>Working with XML</h1>
<blockquote>
<p><strong>Getting XML work done</strong></p>
<p>(instead of getting time wasted)</p>
</blockquote>
</div>
<div class="slide" id="how-can-you-work-with-xml">
<h1>How can you work with XML?</h1>
<ul class="simple">
<li>Preparation:<ul>
<li>Implement usable data classes as an abstraction layer</li>
<li>Implement a mapping from XML to the data classes</li>
<li>Implement a mapping from the data classes to XML</li>
</ul>
</li>
<li>Workflow:<ul>
<li>parse XML data</li>
<li>map XML data to data classes</li>
<li>work with data classes</li>
<li>map data classes to XML</li>
<li>serialise XML</li>
</ul>
</li>
</ul>
<ul class="incremental simple">
<li>Approach:<ul>
<li>get rid of XML and do everything in your own code</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="what-if-you-could-simplify-this">
<h1>What if you could simplify this?</h1>
<ul class="simple">
<li>Preparation:<ul>
<li>Extend usable XML API classes into an abstraction layer</li>
</ul>
</li>
<li>Workflow:<ul>
<li>parse XML data into XML API classes</li>
<li>work with XML API classes</li>
<li>serialise XML</li>
</ul>
</li>
</ul>
<ul class="incremental simple">
<li>Approach:<ul>
<li>cover only the quirks of XML and make it work <em>for</em> you</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="id3">
<h1>What if you could simplify this ...</h1>
<ul class="simple">
<li>... without sacrificing usability or flexibility?</li>
<li>... using a high-speed, full-featured, pythonic XML toolkit?</li>
<li>... with the power of XPath, XSLT and XML validation?</li>
</ul>
<p class="incremental center">... then »lxml« is your friend!</p>
</div>
<div class="slide" id="overview">
<h1>Overview</h1>
<ul class="simple">
<li>What is lxml?<ul>
<li>what &amp; who</li>
</ul>
</li>
<li>How do you use it?<ul>
<li>Lesson 0: quick API overview<ul>
<li>ElementTree concepts and lxml features</li>
</ul>
</li>
<li>Lesson 1: parse XML<ul>
<li>how to get XML data into memory</li>
</ul>
</li>
<li>Lesson 2: generate XML<ul>
<li>how to write an XML generator for a language</li>
</ul>
</li>
<li>Lesson 3: working with XML trees made easy<ul>
<li>how to write an XML API for a language</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="what-is-lxml">
<h1>What is lxml?</h1>
<ul class="simple">
<li>a fast, full-featured toolkit for XML and HTML handling<ul>
<li><a class="reference external" href="http://codespeak.net/lxml/">http://codespeak.net/lxml/</a></li>
<li><a class="reference external" href="mailto:lxml-dev&#64;codespeak.net">lxml-dev&#64;codespeak.net</a></li>
</ul>
</li>
<li>based on and inspired by<ul>
<li>the C libraries libxml2 and libxslt (by Daniel Veillard)</li>
<li>the ElementTree API (by Fredrik Lundh)</li>
<li>the Cython compiler (by Robert Bradshaw, Greg Ewing &amp; me)</li>
<li>the Python language (by Guido &amp; [<em>paste Misc/ACKS here</em>])</li>
<li>user feedback, ideas and patches (by you!)<ul>
<li>keep doing that, we love you all!</li>
</ul>
</li>
</ul>
</li>
<li>maintained (and major parts) written by myself<ul>
<li>initial design and implementation by Martijn Faassen</li>
<li>extensive HTML API and tools by Ian Bicking</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="what-do-you-get-for-your-money">
<h1>What do you get for your money?</h1>
<ul class="simple">
<li>many tools in one:<ul>
<li>Generic, ElementTree compatible XML API: <strong>lxml.etree</strong><ul>
<li>but faster for many tasks and much more feature-rich</li>
</ul>
</li>
<li>Special tool set for HTML handling: <strong>lxml.html</strong></li>
<li>Special API for pythonic data binding: <strong>lxml.objectify</strong></li>
<li>General purpose path languages: XPath and CSS selectors</li>
<li>Validation: DTD, XML Schema, RelaxNG, Schematron</li>
<li>XSLT, XInclude, C14N, ...</li>
<li>Fast tree iteration, event-driven parsing, ...</li>
</ul>
</li>
<li>it's free, but it's worth every €-Cent!<ul>
<li>what users say:<ul>
<li>»no qualification, I would recommend lxml for just about any
HTML task«</li>
<li>»THE tool [...] for newbies and experienced developers«</li>
<li>»you can do pretty much anything with an intuitive API«</li>
<li>»lxml takes all the pain out of XML«</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="lesson-0-a-quick-overview">
<h1>Lesson 0: a quick overview</h1>
<blockquote>
<p>why <strong>»lxml takes all the pain out of XML«</strong></p>
<p>(a quick overview of lxml features and ElementTree concepts)</p>
</blockquote>
<!-- >>> from lxml import etree, cssselect, html
>>> some_xml_data  = "<root><speech class='dialog'><p>So be it!</p></speech><p>stuff</p></root>"
>>> some_html_data = "<p>Just a quick note<br>next line</p>"
>>> xml_tree = etree.XML(some_xml_data)
>>> html_tree = html.fragment_fromstring(some_html_data) -->
</div>
<div class="slide" id="namespaces-in-elementtree">
<h1>Namespaces in ElementTree</h1>
<ul>
<li><p class="first">uses Clark notation:</p>
<ul class="simple">
<li>wrap namespace URI in <tt class="docutils literal"><span class="pre">{...}</span></tt></li>
<li>append the tag name</li>
</ul>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">tag</span> <span class="o">=</span> <span class="s2">&quot;{http://www.w3.org/the/namespace}tagname&quot;</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">element</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">Element</span><span class="p">(</span><span class="n">tag</span><span class="p">)</span>
</pre></div>
</li>
<li><p class="first">no prefixes!</p>
</li>
<li><p class="first">a single, self-containing tag identifier</p>
</li>
</ul>
</div>
<div class="slide" id="text-content-in-elementtree">
<h1>Text content in ElementTree</h1>
<ul>
<li><p class="first">uses <tt class="docutils literal">.text</tt> and <tt class="docutils literal">.tail</tt> attributes:</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">div</span> <span class="o">=</span> <span class="n">html</span><span class="o">.</span><span class="n">fragment_fromstring</span><span class="p">(</span>
<span class="gp">... </span>    <span class="s2">&quot;&lt;div&gt;&lt;p&gt;a paragraph&lt;br&gt;split in two&lt;/p&gt; parts&lt;/div&gt;&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">p</span> <span class="o">=</span> <span class="n">div</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">br</span> <span class="o">=</span> <span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">p</span><span class="o">.</span><span class="n">text</span>
<span class="go">&#39;a paragraph&#39;</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">br</span><span class="o">.</span><span class="n">text</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">br</span><span class="o">.</span><span class="n">tail</span>
<span class="go">&#39;split in two&#39;</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">p</span><span class="o">.</span><span class="n">tail</span>
<span class="go">&#39; parts&#39;</span>
</pre></div>
</li>
<li><p class="first">no text nodes!</p>
<ul class="simple">
<li>simplifies tree traversal a lot</li>
<li>simplifies many XML algorithms</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="attributes-in-elementtree">
<h1>Attributes in ElementTree</h1>
<ul>
<li><p class="first">uses <tt class="docutils literal">.get()</tt> and <tt class="docutils literal">.set()</tt> methods:</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">root</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span>
<span class="gp">... </span>    <span class="s1">&#39;&lt;root a=&quot;the value&quot; b=&quot;of an&quot; c=&quot;attribute&quot;/&gt;&#39;</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">root</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;a&#39;</span><span class="p">)</span>
<span class="go">&#39;the value&#39;</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">root</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s2">&quot;THE value&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">root</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;a&#39;</span><span class="p">)</span>
<span class="go">&#39;THE value&#39;</span>
</pre></div>
</li>
<li><p class="first">or the <tt class="docutils literal">.attrib</tt> dictionary property:</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">d</span> <span class="o">=</span> <span class="n">root</span><span class="o">.</span><span class="n">attrib</span>

<span class="gp">&gt;&gt;&gt; </span><span class="nb">list</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">d</span><span class="o">.</span><span class="n">keys</span><span class="p">()))</span>
<span class="go">[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="nb">list</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">d</span><span class="o">.</span><span class="n">values</span><span class="p">()))</span>
<span class="go">[&#39;THE value&#39;, &#39;attribute&#39;, &#39;of an&#39;]</span>
</pre></div>
</li>
</ul>
</div>
<div class="slide" id="tree-iteration-in-lxml-etree-1">
<h1>Tree iteration in lxml.etree (1)</h1>
<!-- >>> import collections -->
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">root</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span>
<span class="gp">... </span>  <span class="s2">&quot;&lt;root&gt; &lt;a&gt;&lt;b/&gt;&lt;b/&gt;&lt;/a&gt; &lt;c&gt;&lt;d/&gt;&lt;e&gt;&lt;f/&gt;&lt;/e&gt;&lt;g/&gt;&lt;/c&gt; &lt;/root&gt;&quot;</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">([</span><span class="n">child</span><span class="o">.</span><span class="n">tag</span> <span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">root</span><span class="p">])</span>   <span class="c1"># children</span>
<span class="go">[&#39;a&#39;, &#39;c&#39;]</span>

<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">([</span><span class="n">el</span><span class="o">.</span><span class="n">tag</span> <span class="k">for</span> <span class="n">el</span> <span class="ow">in</span> <span class="n">root</span><span class="o">.</span><span class="n">iter</span><span class="p">()])</span>  <span class="c1"># self and descendants</span>
<span class="go">[&#39;root&#39;, &#39;a&#39;, &#39;b&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;, &#39;f&#39;, &#39;g&#39;]</span>

<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">([</span><span class="n">el</span><span class="o">.</span><span class="n">tag</span> <span class="k">for</span> <span class="n">el</span> <span class="ow">in</span> <span class="n">root</span><span class="o">.</span><span class="n">iterdescendants</span><span class="p">()])</span>
<span class="go">[&#39;a&#39;, &#39;b&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;, &#39;f&#39;, &#39;g&#39;]</span>


<span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">iter_breadth_first</span><span class="p">(</span><span class="n">root</span><span class="p">):</span>
<span class="gp">... </span>    <span class="n">bfs_queue</span> <span class="o">=</span> <span class="n">collections</span><span class="o">.</span><span class="n">deque</span><span class="p">([</span><span class="n">root</span><span class="p">])</span>
<span class="gp">... </span>    <span class="k">while</span> <span class="n">bfs_queue</span><span class="p">:</span>
<span class="gp">... </span>        <span class="n">el</span> <span class="o">=</span> <span class="n">bfs_queue</span><span class="o">.</span><span class="n">popleft</span><span class="p">()</span>  <span class="c1"># pop next element</span>
<span class="gp">... </span>        <span class="n">bfs_queue</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">el</span><span class="p">)</span>      <span class="c1"># append its children</span>
<span class="gp">... </span>        <span class="k">yield</span> <span class="n">el</span>

<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">([</span><span class="n">el</span><span class="o">.</span><span class="n">tag</span> <span class="k">for</span> <span class="n">el</span> <span class="ow">in</span> <span class="n">iter_breadth_first</span><span class="p">(</span><span class="n">root</span><span class="p">)])</span>
<span class="go">[&#39;root&#39;, &#39;a&#39;, &#39;c&#39;, &#39;b&#39;, &#39;b&#39;, &#39;d&#39;, &#39;e&#39;, &#39;g&#39;, &#39;f&#39;]</span>
</pre></div>
</div>
<div class="slide" id="tree-iteration-in-lxml-etree-2">
<h1>Tree iteration in lxml.etree (2)</h1>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">root</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span>
<span class="gp">... </span>  <span class="s2">&quot;&lt;root&gt; &lt;a&gt;&lt;b/&gt;&lt;b/&gt;&lt;/a&gt; &lt;c&gt;&lt;d/&gt;&lt;e&gt;&lt;f/&gt;&lt;/e&gt;&lt;g/&gt;&lt;/c&gt; &lt;/root&gt;&quot;</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">tree_walker</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">iterwalk</span><span class="p">(</span><span class="n">root</span><span class="p">,</span> <span class="n">events</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;start&#39;</span><span class="p">,</span> <span class="s1">&#39;end&#39;</span><span class="p">))</span>

<span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">element</span><span class="p">)</span> <span class="ow">in</span> <span class="n">tree_walker</span><span class="p">:</span>
<span class="gp">... </span>    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;</span><span class="si">%s</span><span class="s2"> (</span><span class="si">%s</span><span class="s2">)&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">element</span><span class="o">.</span><span class="n">tag</span><span class="p">,</span> <span class="n">event</span><span class="p">))</span>
<span class="go">root (start)</span>
<span class="go">a (start)</span>
<span class="go">b (start)</span>
<span class="go">b (end)</span>
<span class="go">b (start)</span>
<span class="go">b (end)</span>
<span class="go">a (end)</span>
<span class="go">c (start)</span>
<span class="go">d (start)</span>
<span class="go">d (end)</span>
<span class="go">e (start)</span>
<span class="go">f (start)</span>
<span class="go">f (end)</span>
<span class="go">e (end)</span>
<span class="go">g (start)</span>
<span class="go">g (end)</span>
<span class="go">c (end)</span>
<span class="go">root (end)</span>
</pre></div>
</div>
<div class="slide" id="path-languages-in-lxml">
<h1>Path languages in lxml</h1>
<div class="highlight"><pre><span class="nt">&lt;root&gt;</span>
  <span class="nt">&lt;speech</span> <span class="na">class=</span><span class="s">&#39;dialog&#39;</span><span class="nt">&gt;&lt;p&gt;</span>So be it!<span class="nt">&lt;/p&gt;&lt;/speech&gt;</span>
  <span class="nt">&lt;p&gt;</span>stuff<span class="nt">&lt;/p&gt;</span>
<span class="nt">&lt;/root&gt;</span>
</pre></div>
<ul>
<li><p class="first">search it with XPath</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">find_paragraphs</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">XPath</span><span class="p">(</span><span class="s2">&quot;//p&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">paragraphs</span> <span class="o">=</span> <span class="n">find_paragraphs</span><span class="p">(</span><span class="n">xml_tree</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">([</span> <span class="n">p</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">paragraphs</span> <span class="p">])</span>
<span class="go">[&#39;So be it!&#39;, &#39;stuff&#39;]</span>
</pre></div>
</li>
<li><p class="first">search it with CSS selectors</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">find_dialogs</span> <span class="o">=</span> <span class="n">cssselect</span><span class="o">.</span><span class="n">CSSSelector</span><span class="p">(</span><span class="s2">&quot;speech.dialog p&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">paragraphs</span> <span class="o">=</span> <span class="n">find_dialogs</span><span class="p">(</span><span class="n">xml_tree</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">([</span> <span class="n">p</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">paragraphs</span> <span class="p">])</span>
<span class="go">[&#39;So be it!&#39;]</span>
</pre></div>
</li>
</ul>
</div>
<div class="slide" id="summary-of-lesson-0">
<h1>Summary of lesson 0</h1>
<ul class="simple">
<li>lxml comes with various tools<ul>
<li>that aim to hide the quirks of XML</li>
<li>that simplify finding and handling data</li>
<li>that make XML a pythonic tool by itself</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="lesson-1-parsing-xml-html">
<h1>Lesson 1: parsing XML/HTML</h1>
<blockquote>
<p><strong>The input side</strong></p>
<p>(a quick overview)</p>
</blockquote>
</div>
<div class="slide" id="parsing-xml-and-html-from">
<h1>Parsing XML and HTML from ...</h1>
<ul class="simple">
<li>strings: <tt class="docutils literal">fromstring(xml_data)</tt><ul>
<li>byte strings, but also unicode strings</li>
</ul>
</li>
<li>filenames: <tt class="docutils literal">parse(filename)</tt></li>
<li>HTTP/FTP URLs: <tt class="docutils literal">parse(url)</tt></li>
<li>file objects: <tt class="docutils literal">parse(f)</tt><ul>
<li><tt class="docutils literal">f = open(filename, 'rb')</tt> !</li>
</ul>
</li>
<li>file-like objects: <tt class="docutils literal">parse(f)</tt><ul>
<li>only need a <tt class="docutils literal">f.read(size)</tt> method</li>
</ul>
</li>
<li>data chunks: <tt class="docutils literal">parser.feed(xml_chunk)</tt><ul>
<li><tt class="docutils literal">result = parser.close()</tt></li>
</ul>
</li>
</ul>
<p class="small right">(parsing from strings and filenames/URLs frees the GIL)</p>
</div>
<div class="slide" id="example-parsing-from-a-string">
<h1>Example: parsing from a string</h1>
<ul>
<li><p class="first">using the <tt class="docutils literal">fromstring()</tt> function:</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">root_element</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span><span class="n">some_xml_data</span><span class="p">)</span>
</pre></div>
</li>
<li><p class="first">using the <tt class="docutils literal">fromstring()</tt> function with a specific parser:</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">HTMLParser</span><span class="p">(</span><span class="n">remove_comments</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">root_element</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span><span class="n">some_html_data</span><span class="p">,</span> <span class="n">parser</span><span class="p">)</span>
</pre></div>
</li>
<li><p class="first">or the <tt class="docutils literal">XML()</tt> and <tt class="docutils literal">HTML()</tt> aliases for literals in code:</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">root_element</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">XML</span><span class="p">(</span><span class="s2">&quot;&lt;root&gt;&lt;child/&gt;&lt;/root&gt;&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">root_element</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">HTML</span><span class="p">(</span><span class="s2">&quot;&lt;p&gt;some&lt;br&gt;paragraph&lt;/p&gt;&quot;</span><span class="p">)</span>
</pre></div>
</li>
</ul>
</div>
<div class="slide" id="parsing-xml-into">
<h1>Parsing XML into ...</h1>
<ul class="simple">
<li>a tree in memory<ul>
<li><tt class="docutils literal">parse()</tt> and <tt class="docutils literal">fromstring()</tt> functions</li>
</ul>
</li>
<li>a tree in memory, but step-by-step with a generator<ul>
<li><tt class="docutils literal">iterparse()</tt> generates <tt class="docutils literal">(start/end, element)</tt> events</li>
<li>tree can be cleaned up to save space</li>
</ul>
</li>
<li>SAX-like callbacks without building a tree<ul>
<li><tt class="docutils literal">parse()</tt> and <tt class="docutils literal">fromstring()</tt> functions</li>
<li>pass a <tt class="docutils literal">target</tt> object into the parser</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="summary-of-lesson-1">
<h1>Summary of lesson 1</h1>
<ul class="simple">
<li>parsing XML/HTML in lxml is mostly straight forward<ul>
<li>simple functions that do the job</li>
</ul>
</li>
<li>advanced use cases are pretty simple<ul>
<li>event-driven parsing using <tt class="docutils literal">iterparse()</tt></li>
<li>special parser configuration with keyword arguments<ul>
<li>configuration is generally local to a parser</li>
</ul>
</li>
</ul>
</li>
<li>BTW: parsing is <em>very</em> fast, as is serialising<ul>
<li>don't hesitate to do parse-serialise-parse cycles</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="lesson-2-generating-xml">
<h1>Lesson 2: generating XML</h1>
<blockquote>
<p><strong>The output side</strong></p>
<p>(and how to make it safe and simple)</p>
</blockquote>
</div>
<div class="slide" id="the-example-language-atom">
<h1>The example language: Atom</h1>
<p>The Atom XML format</p>
<ul class="simple">
<li>Namespace: <a class="reference external" href="http://www.w3.org/2005/Atom">http://www.w3.org/2005/Atom</a></li>
<li>W3C recommendation derived from RSS and friends</li>
<li>Atom feeds describe news entries and annotated links<ul>
<li>a <tt class="docutils literal">feed</tt> contains one or more <tt class="docutils literal">entry</tt> elements</li>
<li>an <tt class="docutils literal">entry</tt> contains <tt class="docutils literal">author</tt>, <tt class="docutils literal">link</tt>, <tt class="docutils literal">summary</tt> and/or <tt class="docutils literal">content</tt></li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="example-generate-xml-1">
<h1>Example: generate XML (1)</h1>
<p>The ElementMaker (or <em>E-factory</em>)</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">lxml.builder</span> <span class="kn">import</span> <span class="n">ElementMaker</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">A</span> <span class="o">=</span> <span class="n">ElementMaker</span><span class="p">(</span><span class="n">namespace</span><span class="o">=</span><span class="s2">&quot;http://www.w3.org/2005/Atom&quot;</span><span class="p">,</span>
<span class="gp">... </span>                 <span class="n">nsmap</span><span class="o">=</span><span class="p">{</span><span class="bp">None</span> <span class="p">:</span> <span class="s2">&quot;http://www.w3.org/2005/Atom&quot;</span><span class="p">})</span>
</pre></div>
<div class="incremental"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">atom</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span>
<span class="gp">... </span>  <span class="n">A</span><span class="o">.</span><span class="n">author</span><span class="p">(</span> <span class="n">A</span><span class="o">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&quot;Stefan Behnel&quot;</span><span class="p">)</span> <span class="p">),</span>
<span class="gp">... </span>  <span class="n">A</span><span class="o">.</span><span class="n">entry</span><span class="p">(</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">&quot;News from lxml&quot;</span><span class="p">),</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">link</span><span class="p">(</span><span class="n">href</span><span class="o">=</span><span class="s2">&quot;http://codespeak.net/lxml/&quot;</span><span class="p">),</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">summary</span><span class="p">(</span><span class="s2">&quot;See what&#39;s &lt;b&gt;fun&lt;/b&gt; about lxml...&quot;</span><span class="p">,</span>
<span class="gp">... </span>              <span class="nb">type</span><span class="o">=</span><span class="s2">&quot;html&quot;</span><span class="p">),</span>
<span class="gp">... </span>  <span class="p">)</span>
<span class="gp">... </span><span class="p">)</span>
</pre></div>
</div><div class="incremental"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">lxml.etree</span> <span class="kn">import</span> <span class="n">tostring</span>
<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">(</span> <span class="n">tostring</span><span class="p">(</span><span class="n">atom</span><span class="p">,</span> <span class="n">pretty_print</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="p">)</span>
</pre></div>
</div></div>
<div class="slide" id="example-generate-xml-2">
<h1>Example: generate XML (2)</h1>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">atom</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span>
<span class="gp">... </span>  <span class="n">A</span><span class="o">.</span><span class="n">author</span><span class="p">(</span> <span class="n">A</span><span class="o">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&quot;Stefan Behnel&quot;</span><span class="p">)</span> <span class="p">),</span>
<span class="gp">... </span>  <span class="n">A</span><span class="o">.</span><span class="n">entry</span><span class="p">(</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">&quot;News from lxml&quot;</span><span class="p">),</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">link</span><span class="p">(</span><span class="n">href</span><span class="o">=</span><span class="s2">&quot;http://codespeak.net/lxml/&quot;</span><span class="p">),</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">summary</span><span class="p">(</span><span class="s2">&quot;See what&#39;s &lt;b&gt;fun&lt;/b&gt; about lxml...&quot;</span><span class="p">,</span>
<span class="gp">... </span>              <span class="nb">type</span><span class="o">=</span><span class="s2">&quot;html&quot;</span><span class="p">),</span>
<span class="gp">... </span>  <span class="p">)</span>
<span class="gp">... </span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span class="nt">&lt;feed</span> <span class="na">xmlns=</span><span class="s">&quot;http://www.w3.org/2005/Atom&quot;</span><span class="nt">&gt;</span>
  <span class="nt">&lt;author&gt;</span>
    <span class="nt">&lt;name&gt;</span>Stefan Behnel<span class="nt">&lt;/name&gt;</span>
  <span class="nt">&lt;/author&gt;</span>
  <span class="nt">&lt;entry&gt;</span>
    <span class="nt">&lt;title&gt;</span>News from lxml<span class="nt">&lt;/title&gt;</span>
    <span class="nt">&lt;link</span> <span class="na">href=</span><span class="s">&quot;http://codespeak.net/lxml/&quot;</span><span class="nt">/&gt;</span>
    <span class="nt">&lt;summary</span> <span class="na">type=</span><span class="s">&quot;html&quot;</span><span class="nt">&gt;</span>See what&#39;s <span class="ni">&amp;lt;</span>b<span class="ni">&amp;gt;</span>fun<span class="ni">&amp;lt;</span>/b<span class="ni">&amp;gt;</span>
                         about lxml...<span class="nt">&lt;/summary&gt;</span>
  <span class="nt">&lt;/entry&gt;</span>
<span class="nt">&lt;/feed&gt;</span>
</pre></div>
</div>
<div class="slide" id="be-careful-what-you-type">
<h1>Be careful what you type!</h1>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">atom</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span>
<span class="gp">... </span>  <span class="n">A</span><span class="o">.</span><span class="n">author</span><span class="p">(</span> <span class="n">A</span><span class="o">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&quot;Stefan Behnel&quot;</span><span class="p">)</span> <span class="p">),</span>
<span class="gp">... </span>  <span class="n">A</span><span class="o">.</span><span class="n">entry</span><span class="p">(</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">titel</span><span class="p">(</span><span class="s2">&quot;News from lxml&quot;</span><span class="p">),</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">link</span><span class="p">(</span><span class="n">href</span><span class="o">=</span><span class="s2">&quot;http://codespeak.net/lxml/&quot;</span><span class="p">),</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">summary</span><span class="p">(</span><span class="s2">&quot;See what&#39;s &lt;b&gt;fun&lt;/b&gt; about lxml...&quot;</span><span class="p">,</span>
<span class="gp">... </span>              <span class="nb">type</span><span class="o">=</span><span class="s2">&quot;html&quot;</span><span class="p">),</span>
<span class="gp">... </span>  <span class="p">)</span>
<span class="gp">... </span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span class="nt">&lt;feed</span> <span class="na">xmlns=</span><span class="s">&quot;http://www.w3.org/2005/Atom&quot;</span><span class="nt">&gt;</span>
  <span class="nt">&lt;author&gt;</span>
    <span class="nt">&lt;name&gt;</span>Stefan Behnel<span class="nt">&lt;/name&gt;</span>
  <span class="nt">&lt;/author&gt;</span>
  <span class="nt">&lt;entry&gt;</span>
    <span class="nt">&lt;titel&gt;</span>News from lxml<span class="nt">&lt;/titel&gt;</span>
    <span class="nt">&lt;link</span> <span class="na">href=</span><span class="s">&quot;http://codespeak.net/lxml/&quot;</span><span class="nt">/&gt;</span>
    <span class="nt">&lt;summary</span> <span class="na">type=</span><span class="s">&quot;html&quot;</span><span class="nt">&gt;</span>See what&#39;s <span class="ni">&amp;lt;</span>b<span class="ni">&amp;gt;</span>fun<span class="ni">&amp;lt;</span>/b<span class="ni">&amp;gt;</span>
                         about lxml...<span class="nt">&lt;/summary&gt;</span>
  <span class="nt">&lt;/entry&gt;</span>
<span class="nt">&lt;/feed&gt;</span>
</pre></div>
</div>
<div class="slide" id="want-more-type-safety">
<h1>Want more 'type safety'?</h1>
<p>Write an XML generator <em>module</em> instead:</p>
<div class="highlight"><pre><span class="c1"># atomgen.py</span>

<span class="kn">from</span> <span class="nn">lxml</span> <span class="kn">import</span> <span class="n">etree</span>
<span class="kn">from</span> <span class="nn">lxml.builder</span> <span class="kn">import</span> <span class="n">ElementMaker</span>

<span class="n">ATOM_NAMESPACE</span> <span class="o">=</span> <span class="s2">&quot;http://www.w3.org/2005/Atom&quot;</span>

<span class="n">A</span> <span class="o">=</span> <span class="n">ElementMaker</span><span class="p">(</span><span class="n">namespace</span><span class="o">=</span><span class="n">ATOM_NAMESPACE</span><span class="p">,</span>
                 <span class="n">nsmap</span><span class="o">=</span><span class="p">{</span><span class="bp">None</span> <span class="p">:</span> <span class="n">ATOM_NAMESPACE</span><span class="p">})</span>

<span class="n">feed</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">feed</span>
<span class="n">entry</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">entry</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">title</span>
<span class="c1"># ... and so on and so forth ...</span>


<span class="c1"># plus a little validation function: isvalid()</span>
<span class="n">isvalid</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">RelaxNG</span><span class="p">(</span><span class="nb">file</span><span class="o">=</span><span class="s2">&quot;atom.rng&quot;</span><span class="p">)</span>
</pre></div>
</div>
<div class="slide" id="the-atom-generator-module">
<h1>The Atom generator module</h1>
<!-- >>> import sys
>>> sys.path.insert(0, "ep2008") -->
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">atomgen</span> <span class="kn">as</span> <span class="nn">A</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">atom</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span>
<span class="gp">... </span>  <span class="n">A</span><span class="o">.</span><span class="n">author</span><span class="p">(</span> <span class="n">A</span><span class="o">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&quot;Stefan Behnel&quot;</span><span class="p">)</span> <span class="p">),</span>
<span class="gp">... </span>  <span class="n">A</span><span class="o">.</span><span class="n">entry</span><span class="p">(</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">link</span><span class="p">(</span><span class="n">href</span><span class="o">=</span><span class="s2">&quot;http://codespeak.net/lxml/&quot;</span><span class="p">),</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">&quot;News from lxml&quot;</span><span class="p">),</span>
<span class="gp">... </span>    <span class="n">A</span><span class="o">.</span><span class="n">summary</span><span class="p">(</span><span class="s2">&quot;See what&#39;s &lt;b&gt;fun&lt;/b&gt; about lxml...&quot;</span><span class="p">,</span>
<span class="gp">... </span>              <span class="nb">type</span><span class="o">=</span><span class="s2">&quot;html&quot;</span><span class="p">),</span>
<span class="gp">... </span>  <span class="p">)</span>
<span class="gp">... </span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">A</span><span class="o">.</span><span class="n">isvalid</span><span class="p">(</span><span class="n">atom</span><span class="p">)</span> <span class="c1"># ok, forgot the ID&#39;s =&gt; invalid XML ...</span>
<span class="go">False</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">title</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">titel</span><span class="p">(</span><span class="s2">&quot;News from lxml&quot;</span><span class="p">)</span>
<span class="gt">Traceback (most recent call last):</span>
  <span class="c">...</span>
<span class="gr">AttributeError</span>: <span class="n">&#39;module&#39; object has no attribute &#39;titel&#39;</span>
</pre></div>
</div>
<div class="slide" id="mixing-languages-1">
<h1>Mixing languages (1)</h1>
<p>Atom can embed <em>serialised</em> HTML</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">lxml.html.builder</span> <span class="kn">as</span> <span class="nn">h</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">html_fragment</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="n">DIV</span><span class="p">(</span>
<span class="gp">... </span>  <span class="s2">&quot;this is some</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">,</span>
<span class="gp">... </span>  <span class="n">h</span><span class="o">.</span><span class="n">A</span><span class="p">(</span><span class="s2">&quot;HTML&quot;</span><span class="p">,</span> <span class="n">href</span><span class="o">=</span><span class="s2">&quot;http://w3.org/MarkUp/&quot;</span><span class="p">),</span>
<span class="gp">... </span>  <span class="s2">&quot;</span><span class="se">\n</span><span class="s2">content&quot;</span><span class="p">)</span>
</pre></div>
<div class="incremental"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">serialised_html</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">tostring</span><span class="p">(</span><span class="n">html_fragment</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s2">&quot;html&quot;</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">summary</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">summary</span><span class="p">(</span><span class="n">serialised_html</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="s2">&quot;html&quot;</span><span class="p">)</span>
</pre></div>
</div><div class="incremental"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">(</span><span class="n">etree</span><span class="o">.</span><span class="n">tostring</span><span class="p">(</span><span class="n">summary</span><span class="p">))</span>
<span class="go">&lt;summary xmlns=&quot;http://www.w3.org/2005/Atom&quot; type=&quot;html&quot;&gt;</span>
<span class="go">   &amp;lt;div&amp;gt;this is some</span>
<span class="go">   &amp;lt;a href=&quot;http://w3.org/MarkUp/&quot;&amp;gt;HTML&amp;lt;/a&amp;gt;</span>
<span class="go">   content&amp;lt;/div&amp;gt;</span>
<span class="go">&lt;/summary&gt;</span>
</pre></div>
</div></div>
<div class="slide" id="mixing-languages-2">
<h1>Mixing languages (2)</h1>
<p>Atom can also embed non-escaped XHTML</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">deepcopy</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">xhtml_fragment</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="n">html_fragment</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">lxml.html</span> <span class="kn">import</span> <span class="n">html_to_xhtml</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">html_to_xhtml</span><span class="p">(</span><span class="n">xhtml_fragment</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">summary</span> <span class="o">=</span> <span class="n">A</span><span class="o">.</span><span class="n">summary</span><span class="p">(</span><span class="n">xhtml_fragment</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="s2">&quot;xhtml&quot;</span><span class="p">)</span>
</pre></div>
<div class="incremental"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">(</span><span class="n">etree</span><span class="o">.</span><span class="n">tostring</span><span class="p">(</span><span class="n">summary</span><span class="p">,</span> <span class="n">pretty_print</span><span class="o">=</span><span class="bp">True</span><span class="p">))</span>
<span class="go">&lt;summary xmlns=&quot;http://www.w3.org/2005/Atom&quot; type=&quot;xhtml&quot;&gt;</span>
<span class="go">  &lt;html:div xmlns:html=&quot;http://www.w3.org/1999/xhtml&quot;&gt;this is some</span>
<span class="go">  &lt;html:a href=&quot;http://w3.org/MarkUp/&quot;&gt;HTML&lt;/html:a&gt;</span>
<span class="go">  content&lt;/html:div&gt;</span>
<span class="go">&lt;/summary&gt;</span>
</pre></div>
</div></div>
<div class="slide" id="summary-of-lesson-2">
<h1>Summary of lesson 2</h1>
<ul class="simple">
<li>generating XML is easy<ul>
<li>use the ElementMaker</li>
</ul>
</li>
<li>wrap it in a module that provides<ul>
<li>the target namespace</li>
<li>an ElementMaker name for each language element</li>
<li>a validator</li>
<li>maybe additional helper functions</li>
</ul>
</li>
<li>mixing languages is easy<ul>
<li>define a generator module for each</li>
</ul>
</li>
</ul>
<p>... this is all you need for the <em>output</em> side of XML languages</p>
</div>
<div class="slide" id="lesson-3-designing-xml-apis">
<h1>Lesson 3: Designing XML APIs</h1>
<blockquote>
<p><strong>The Element API</strong></p>
<p>(and how to make it the way <em>you</em> want)</p>
</blockquote>
</div>
<div class="slide" id="trees-in-c-and-in-python">
<h1>Trees in C and in Python</h1>
<ul class="simple">
<li>Trees have two representations:<ul>
<li>a plain, complete, low-level C tree provided by libxml2</li>
<li>a set of Python Element proxies, each representing one element</li>
</ul>
</li>
<li>Proxies are created on-the-fly:<ul>
<li>lxml creates an Element object for a C node on request</li>
<li>proxies are garbage collected when going out of scope</li>
<li>XML trees are garbage collected when deleting the last proxy</li>
</ul>
</li>
</ul>
<img alt="ep2008/proxies.png" class="center" src="ep2008/proxies.png" />
</div>
<div class="slide" id="mapping-python-classes-to-nodes">
<h1>Mapping Python classes to nodes</h1>
<ul class="simple">
<li>Proxies can be assigned to XML nodes <em>by user code</em><ul>
<li>lxml tells you about a node, you return a class</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="example-a-simple-element-class-1">
<h1>Example: a simple Element class (1)</h1>
<ul>
<li><p class="first">define a subclass of ElementBase</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="k">class</span> <span class="nc">HonkElement</span><span class="p">(</span><span class="n">etree</span><span class="o">.</span><span class="n">ElementBase</span><span class="p">):</span>
<span class="gp">... </span>   <span class="nd">@property</span>
<span class="gp">... </span>   <span class="k">def</span> <span class="nf">honking</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="gp">... </span>      <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;honking&#39;</span><span class="p">)</span> <span class="o">==</span> <span class="s1">&#39;true&#39;</span>
</pre></div>
</li>
<li><p class="first">let it replace the default Element class</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">lookup</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">ElementDefaultClassLookup</span><span class="p">(</span>
<span class="gp">... </span>                            <span class="n">element</span><span class="o">=</span><span class="n">HonkElement</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">XMLParser</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">set_element_class_lookup</span><span class="p">(</span><span class="n">lookup</span><span class="p">)</span>
</pre></div>
</li>
</ul>
</div>
<div class="slide" id="example-a-simple-element-class-2">
<h1>Example: a simple Element class (2)</h1>
<ul>
<li><p class="first">use the new Element class</p>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">root</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">XML</span><span class="p">(</span><span class="s1">&#39;&lt;root&gt;&lt;honk honking=&quot;true&quot;/&gt;&lt;/root&gt;&#39;</span><span class="p">,</span>
<span class="gp">... </span>                 <span class="n">parser</span><span class="p">)</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">root</span><span class="o">.</span><span class="n">honking</span>
<span class="go">False</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">root</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">honking</span>
<span class="go">True</span>
</pre></div>
</li>
</ul>
</div>
<div class="slide" id="id4">
<h1>Mapping Python classes to nodes</h1>
<ul class="simple">
<li>The Element class lookup<ul>
<li>lxml tells you about a node, you return a class</li>
<li>no restrictions on lookup algorithm</li>
<li>each parser can use a different class lookup scheme</li>
<li>lookup schemes can be chained through fallbacks</li>
</ul>
</li>
<li>Classes can be selected based on<ul>
<li>the node type (element, comment or processing instruction)<ul>
<li><tt class="docutils literal">ElementDefaultClassLookup()</tt></li>
</ul>
</li>
<li>the namespaced node name<ul>
<li><tt class="docutils literal">CustomElementClassLookup()</tt> + a fallback</li>
<li><tt class="docutils literal">ElementNamespaceClassLookup()</tt> + a fallback</li>
</ul>
</li>
<li>the value of an attribute (e.g. <tt class="docutils literal">id</tt> or <tt class="docutils literal">class</tt>)<ul>
<li><tt class="docutils literal">AttributeBasedElementClassLookup()</tt> + a fallback</li>
</ul>
</li>
<li>read-only inspection of the tree<ul>
<li><tt class="docutils literal">PythonElementClassLookup()</tt> + a fallback</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="designing-an-atom-api">
<h1>Designing an Atom API</h1>
<ul>
<li><p class="first">a feed is a container for entries</p>
<div class="highlight"><pre><span class="c1"># atom.py</span>

<span class="n">ATOM_NAMESPACE</span> <span class="o">=</span> <span class="s2">&quot;http://www.w3.org/2005/Atom&quot;</span>
<span class="n">_ATOM_NS</span> <span class="o">=</span> <span class="s2">&quot;{</span><span class="si">%s</span><span class="s2">}&quot;</span> <span class="o">%</span> <span class="n">ATOM_NAMESPACE</span>

<span class="k">class</span> <span class="nc">FeedElement</span><span class="p">(</span><span class="n">etree</span><span class="o">.</span><span class="n">ElementBase</span><span class="p">):</span>
    <span class="nd">@property</span>
    <span class="k">def</span> <span class="nf">entries</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
       <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="n">_ATOM_NS</span> <span class="o">+</span> <span class="s2">&quot;entry&quot;</span><span class="p">)</span>
</pre></div>
</li>
<li><p class="first">it also has a couple of meta-data children, e.g. <tt class="docutils literal">title</tt></p>
<div class="highlight"><pre><span class="k">class</span> <span class="nc">FeedElement</span><span class="p">(</span><span class="n">etree</span><span class="o">.</span><span class="n">ElementBase</span><span class="p">):</span>
    <span class="c1"># ...</span>
    <span class="nd">@property</span>
    <span class="k">def</span> <span class="nf">title</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="s2">&quot;return the title or None&quot;</span>
        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">&quot;title&quot;</span><span class="p">)</span>
</pre></div>
</li>
</ul>
</div>
<div class="slide" id="consider-lxml-objectify">
<h1>Consider lxml.objectify</h1>
<ul class="simple">
<li>ready-to-use, generic Python object API for XML</li>
</ul>
<div class="highlight"><pre><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">lxml</span> <span class="kn">import</span> <span class="n">objectify</span>

<span class="o">&gt;&gt;&gt;</span> <span class="n">feed</span> <span class="o">=</span> <span class="n">objectify</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">&quot;atom-example.xml&quot;</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">feed</span><span class="o">.</span><span class="n">title</span><span class="p">)</span>
<span class="n">Example</span> <span class="n">Feed</span>

<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">([</span><span class="n">entry</span><span class="o">.</span><span class="n">title</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">feed</span><span class="o">.</span><span class="n">entry</span><span class="p">])</span>
<span class="p">[</span><span class="s1">&#39;Atom-Powered Robots Run Amok&#39;</span><span class="p">]</span>

<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">feed</span><span class="o">.</span><span class="n">entry</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">title</span><span class="p">)</span>
<span class="n">Atom</span><span class="o">-</span><span class="n">Powered</span> <span class="n">Robots</span> <span class="n">Run</span> <span class="n">Amok</span>
</pre></div>
</div>
<div class="slide" id="still-room-for-more-convenience">
<h1>Still room for more convenience</h1>
<div class="highlight"><pre><span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">chain</span>

<span class="k">class</span> <span class="nc">FeedElement</span><span class="p">(</span><span class="n">objectify</span><span class="o">.</span><span class="n">ObjectifiedElement</span><span class="p">):</span>

    <span class="k">def</span> <span class="nf">addIDs</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="s2">&quot;initialise the IDs of feed and entries&quot;</span>

        <span class="k">for</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">chain</span><span class="p">([</span><span class="bp">self</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">entry</span><span class="p">):</span>
            <span class="k">if</span> <span class="n">element</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="n">_ATOM_NS</span> <span class="o">+</span> <span class="s2">&quot;id&quot;</span><span class="p">)</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
                <span class="nb">id</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">SubElement</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">_ATOM_NS</span> <span class="o">+</span> <span class="s2">&quot;id&quot;</span><span class="p">)</span>
                <span class="nb">id</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">make_guid</span><span class="p">()</span>
</pre></div>
</div>
<div class="slide" id="incremental-api-design">
<h1>Incremental API design</h1>
<ul class="simple">
<li>choose an XML API to start with<ul>
<li>lxml.etree is general purpose</li>
<li>lxml.objectify is nice for document-style XML</li>
</ul>
</li>
<li>fix Elements that really need some API sugar<ul>
<li>dict-mappings to children with specific content/attributes</li>
<li>properties for specially typed attributes or child values</li>
<li>simplified access to varying content types of an element</li>
<li>shortcuts for unnecessarily deep subtrees</li>
</ul>
</li>
<li>ignore what works well enough with the Element API<ul>
<li>lists of homogeneous children -&gt; Element iteration</li>
<li>string attributes -&gt; .get()/.set()</li>
</ul>
</li>
<li>let the API grow at your fingertips<ul>
<li>play with it and test use cases</li>
<li>avoid &quot;I want because I can&quot; feature explosion!</li>
</ul>
</li>
</ul>
</div>
<div class="slide" id="setting-up-the-element-mapping">
<h1>Setting up the Element mapping</h1>
<p>Atom has a namespace =&gt; leave the mapping to lxml</p>
<div class="highlight"><pre><span class="c1"># ...</span>
<span class="n">_atom_lookup</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">ElementNamespaceClassLookup</span><span class="p">(</span>
                  <span class="n">objectify</span><span class="o">.</span><span class="n">ObjectifyElementClassLookup</span><span class="p">())</span>

<span class="c1"># map the classes to tag names</span>
<span class="n">ns</span> <span class="o">=</span> <span class="n">_atom_lookup</span><span class="o">.</span><span class="n">get_namespace</span><span class="p">(</span><span class="n">ATOM_NAMESPACE</span><span class="p">)</span>
<span class="n">ns</span><span class="p">[</span><span class="s2">&quot;feed&quot;</span><span class="p">]</span>  <span class="o">=</span> <span class="n">FeedElement</span>
<span class="n">ns</span><span class="p">[</span><span class="s2">&quot;entry&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">EntryElement</span>
<span class="c1"># ... and so on</span>
<span class="c1"># or use ns.update(vars()) with appropriate class names</span>

<span class="c1"># create a parser that does some whitespace cleanup</span>
<span class="n">atom_parser</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">XMLParser</span><span class="p">(</span><span class="n">remove_blank_text</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>

<span class="c1"># make it use our Atom classes</span>
<span class="n">atom_parser</span><span class="o">.</span><span class="n">set_element_class_lookup</span><span class="p">(</span><span class="n">_atom_lookup</span><span class="p">)</span>

<span class="c1"># and help users in using our parser setup</span>
<span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">etree</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">atom_parser</span><span class="p">)</span>
</pre></div>
</div>
<div class="slide" id="using-your-new-atom-api">
<h1>Using your new Atom API</h1>
<div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">atom</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">feed</span> <span class="o">=</span> <span class="n">atom</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">&quot;ep2008/atom-example.xml&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getroot</span><span class="p">()</span>

<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">feed</span><span class="o">.</span><span class="n">entry</span><span class="p">))</span>
<span class="go">1</span>
<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">([</span><span class="n">entry</span><span class="o">.</span><span class="n">title</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">feed</span><span class="o">.</span><span class="n">entry</span><span class="p">])</span>
<span class="go">[&#39;Atom-Powered Robots Run Amok&#39;]</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">link_tag</span> <span class="o">=</span> <span class="s2">&quot;{</span><span class="si">%s</span><span class="s2">}link&quot;</span> <span class="o">%</span> <span class="n">atom</span><span class="o">.</span><span class="n">ATOM_NAMESPACE</span>
<span class="gp">&gt;&gt;&gt; </span><span class="k">print</span><span class="p">([</span><span class="n">link</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;href&quot;</span><span class="p">)</span> <span class="k">for</span> <span class="n">link</span> <span class="ow">in</span> <span class="n">feed</span><span class="o">.</span><span class="n">iter</span><span class="p">(</span><span class="n">link_tag</span><span class="p">)])</span>
<span class="go">[&#39;http://example.org/&#39;, &#39;http://example.org/2003/12/13/atom03&#39;]</span>
</pre></div>
</div>
<div class="slide" id="summary-of-lesson-3">
<h1>Summary of lesson 3</h1>
<p>To implement an XML API ...</p>
<ol class="arabic simple">
<li>start off with lxml's Element API<ul>
<li>or take a look at the object API of lxml.objectify</li>
</ul>
</li>
<li>specialise it into a set of custom Element classes</li>
<li>map them to XML tags using one of the lookup schemes</li>
<li>improve the API incrementally while using it<ul>
<li>discover inconveniences and beautify them</li>
<li>avoid putting work into things that work</li>
</ul>
</li>
</ol>
</div>
<div class="slide" id="conclusion">
<h1>Conclusion</h1>
<p>lxml ...</p>
<ul class="simple">
<li>provides a convenient set of tools for XML and HTML<ul>
<li>parsing</li>
<li>generating</li>
<li>working with in-memory trees</li>
</ul>
</li>
<li>follows Python idioms wherever possible<ul>
<li>highly extensible through wrapping and subclassing</li>
<li>callable objects for XPath, CSS selectors, XSLT, schemas</li>
<li>iteration for tree traversal (even while parsing)</li>
<li>list-/dict-like APIs, properties, keyword arguments, ...</li>
</ul>
</li>
<li>makes extension and specialisation easy<ul>
<li>write a special XML generator module in trivial code</li>
<li>write your own XML API incrementally on-the-fly</li>
</ul>
</li>
</ul>
</div>
</div>
</body>
</html>