Sophie

Sophie

distrib > Fedora > 18 > i386 > by-pkgid > 7e03e96dde1cbbdbc7cc96424cd9e059 > files > 305

python-feedparser-doc-5.1.3-3.fc18.noarch.rpm



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Sanitization &mdash; feedparser 5.1.3 documentation</title>
    
    <link rel="stylesheet" href="_static/default.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <link rel="stylesheet" href="_static/feedparser.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '',
        VERSION:     '5.1.3',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="feedparser 5.1.3 documentation" href="index.html" />
    <link rel="up" title="Advanced Features" href="advanced.html" />
    <link rel="next" title="Content Normalization" href="content-normalization.html" />
    <link rel="prev" title="Date Parsing" href="date-parsing.html" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="content-normalization.html" title="Content Normalization"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="date-parsing.html" title="Date Parsing"
             accesskey="P">previous</a> |</li>
        <li><a href="index.html">feedparser 5.1.3 documentation</a> &raquo;</li>
          <li><a href="advanced.html" accesskey="U">Advanced Features</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="sanitization">
<span id="advanced-sanitization"></span><h1>Sanitization<a class="headerlink" href="#sanitization" title="Permalink to this headline">¶</a></h1>
<p>Most feeds embed <abbr title="HyperText Markup Language">HTML</abbr> markup within feed
elements.  Some feeds even embed other types of markup, such as <abbr title="Scalable Vector Graphics">SVG</abbr> or <abbr title="Mathematical Markup Language">MathML</abbr>.
Since many feed aggregators use a web browser (or browser component) to display
content, <strong class="program">Universal Feed Parser</strong> sanitizes embedded markup to remove
things that could pose security risks.</p>
<p>These elements are sanitized by default:</p>
<ul class="simple">
<li><a class="reference internal" href="reference-entry-content.html#reference-entry-content"><em>entries[i].content</em></a></li>
<li><a class="reference internal" href="reference-entry-summary.html#reference-entry-summary"><em>entries[i].summary</em></a></li>
<li><a class="reference internal" href="reference-entry-title.html#reference-entry-title"><em>entries[i].title</em></a></li>
<li><a class="reference internal" href="reference-feed-info.html#reference-feed-info"><em>feed.info</em></a></li>
<li><a class="reference internal" href="reference-feed-rights.html#reference-feed-rights"><em>feed.rights</em></a></li>
<li><a class="reference internal" href="reference-feed-subtitle.html#reference-feed-subtitle"><em>feed.subtitle</em></a></li>
<li><a class="reference internal" href="reference-feed-title.html#reference-feed-title"><em>feed.title</em></a></li>
</ul>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If the content is declared to be (or is determined to be)
<em class="mimetype">text/plain</em>, it will not be sanitized. This is to avoid data loss.
It is recommended that you check the content type in e.g.
<tt class="xref py py-attr docutils literal"><span class="pre">entries[i].summary_detail.type</span></tt>. If it is <em class="mimetype">text/plain</em> then
it has not been sanitized (and you should perform HTML escaping before
rendering the content).</p>
</div>
<div class="section" id="html-sanitization">
<span id="advanced-sanitization-html"></span><h2><abbr title="HyperText Markup Language">HTML</abbr> Sanitization<a class="headerlink" href="#html-sanitization" title="Permalink to this headline">¶</a></h2>
<p>The following <abbr title="HyperText Markup Language">HTML</abbr> elements are allowed by
default (all others are stripped):</p>
<table class="hlist"><tr><td><ul class="simple">
<li>a</li>
<li>abbr</li>
<li>acronym</li>
<li>address</li>
<li>area</li>
<li>article</li>
<li>aside</li>
<li>audio</li>
<li>b</li>
<li>big</li>
<li>blockquote</li>
<li>br</li>
<li>button</li>
<li>canvas</li>
<li>caption</li>
<li>center</li>
<li>cite</li>
<li>code</li>
<li>col</li>
<li>colgroup</li>
<li>command</li>
<li>datagrid</li>
<li>datalist</li>
<li>dd</li>
<li>del</li>
<li>details</li>
<li>dfn</li>
<li>dialog</li>
<li>dir</li>
<li>div</li>
<li>dl</li>
<li>dt</li>
<li>em</li>
</ul>
</td><td><ul class="simple">
<li>event-source</li>
<li>fieldset</li>
<li>figure</li>
<li>font</li>
<li>footer</li>
<li>form</li>
<li>h1</li>
<li>h2</li>
<li>h3</li>
<li>h4</li>
<li>h5</li>
<li>h6</li>
<li>header</li>
<li>hr</li>
<li>i</li>
<li>img</li>
<li>input</li>
<li>ins</li>
<li>kbd</li>
<li>keygen</li>
<li>label</li>
<li>legend</li>
<li>li</li>
<li>m</li>
<li>map</li>
<li>menu</li>
<li>meter</li>
<li>multicol</li>
<li>nav</li>
<li>nextid</li>
<li>noscript</li>
<li>ol</li>
<li>optgroup</li>
</ul>
</td><td><ul class="simple">
<li>option</li>
<li>output</li>
<li>p</li>
<li>pre</li>
<li>progress</li>
<li>q</li>
<li>s</li>
<li>samp</li>
<li>section</li>
<li>select</li>
<li>small</li>
<li>sound</li>
<li>source</li>
<li>spacer</li>
<li>span</li>
<li>strike</li>
<li>strong</li>
<li>sub</li>
<li>sup</li>
<li>table</li>
<li>tbody</li>
<li>td</li>
<li>textarea</li>
<li>tfoot</li>
<li>th</li>
<li>thead</li>
<li>time</li>
<li>tr</li>
<li>tt</li>
<li>u</li>
<li>ul</li>
<li>var</li>
<li>video</li>
</ul>
</td></tr></table>
<p>The following <abbr title="HyperText Markup Language">HTML</abbr> attributes are allowed
by default (all others are stripped):</p>
<table class="hlist"><tr><td><ul class="simple">
<li>abbr</li>
<li>accept</li>
<li>accept-charset</li>
<li>accesskey</li>
<li>action</li>
<li>align</li>
<li>alt</li>
<li>autocomplete</li>
<li>autofocus</li>
<li>autoplay</li>
<li>axis</li>
<li>background</li>
<li>balance</li>
<li>bgcolor</li>
<li>bgproperties</li>
<li>border</li>
<li>bordercolor</li>
<li>bordercolordark</li>
<li>bordercolorlight</li>
<li>bottompadding</li>
<li>cellpadding</li>
<li>cellspacing</li>
<li>ch</li>
<li>challenge</li>
<li>char</li>
<li>charoff</li>
<li>charset</li>
<li>checked</li>
<li>choff</li>
<li>cite</li>
<li>class</li>
<li>clear</li>
<li>color</li>
<li>cols</li>
<li>colspan</li>
<li>compact</li>
<li>contenteditable</li>
<li>coords</li>
<li>data</li>
<li>datafld</li>
<li>datapagesize</li>
<li>datasrc</li>
<li>datetime</li>
<li>default</li>
<li>delay</li>
<li>dir</li>
<li>disabled</li>
</ul>
</td><td><ul class="simple">
<li>draggable</li>
<li>dynsrc</li>
<li>enctype</li>
<li>end</li>
<li>face</li>
<li>for</li>
<li>form</li>
<li>frame</li>
<li>galleryimg</li>
<li>gutter</li>
<li>headers</li>
<li>height</li>
<li>hidden</li>
<li>hidefocus</li>
<li>high</li>
<li>href</li>
<li>hreflang</li>
<li>hspace</li>
<li>icon</li>
<li>id</li>
<li>inputmode</li>
<li>ismap</li>
<li>keytype</li>
<li>label</li>
<li>lang</li>
<li>leftspacing</li>
<li>list</li>
<li>longdesc</li>
<li>loop</li>
<li>loopcount</li>
<li>loopend</li>
<li>loopstart</li>
<li>low</li>
<li>lowsrc</li>
<li>max</li>
<li>maxlength</li>
<li>media</li>
<li>method</li>
<li>min</li>
<li>multiple</li>
<li>name</li>
<li>nohref</li>
<li>noshade</li>
<li>nowrap</li>
<li>open</li>
<li>optimum</li>
<li>pattern</li>
</ul>
</td><td><ul class="simple">
<li>ping</li>
<li>point-size</li>
<li>poster</li>
<li>pqg</li>
<li>preload</li>
<li>prompt</li>
<li>radiogroup</li>
<li>readonly</li>
<li>rel</li>
<li>repeat-max</li>
<li>repeat-min</li>
<li>replace</li>
<li>required</li>
<li>rev</li>
<li>rightspacing</li>
<li>rows</li>
<li>rowspan</li>
<li>rules</li>
<li>scope</li>
<li>selected</li>
<li>shape</li>
<li>size</li>
<li>span</li>
<li>src</li>
<li>start</li>
<li>step</li>
<li>summary</li>
<li>suppress</li>
<li>tabindex</li>
<li>target</li>
<li>template</li>
<li>title</li>
<li>toppadding</li>
<li>type</li>
<li>unselectable</li>
<li>urn</li>
<li>usemap</li>
<li>valign</li>
<li>value</li>
<li>variable</li>
<li>volume</li>
<li>vrml</li>
<li>vspace</li>
<li>width</li>
<li>wrap</li>
<li>xml:lang</li>
</ul>
</td></tr></table>
</div>
<div class="section" id="svg-sanitization">
<span id="advanced-sanitization-svg"></span><h2><abbr title="Scalable Vector Graphics">SVG</abbr> Sanitization<a class="headerlink" href="#svg-sanitization" title="Permalink to this headline">¶</a></h2>
<p>The following SVG elements are allowed by default (all others are stripped):</p>
<table class="hlist"><tr><td><ul class="simple">
<li>a</li>
<li>animate</li>
<li>animateColor</li>
<li>animateMotion</li>
<li>animateTransform</li>
<li>circle</li>
<li>defs</li>
<li>desc</li>
<li>ellipse</li>
<li>font-face</li>
<li>font-face-name</li>
<li>font-face-src</li>
</ul>
</td><td><ul class="simple">
<li>foreignObject</li>
<li>g</li>
<li>glyph</li>
<li>hkern</li>
<li>line</li>
<li>linearGradient</li>
<li>marker</li>
<li>metadata</li>
<li>missing-glyph</li>
<li>mpath</li>
<li>path</li>
<li>polygon</li>
</ul>
</td><td><ul class="simple">
<li>polyline</li>
<li>radialGradient</li>
<li>rect</li>
<li>set</li>
<li>stop</li>
<li>svg</li>
<li>switch</li>
<li>text</li>
<li>title</li>
<li>tspan</li>
<li>use</li>
</ul>
</td></tr></table>
<p>The following <abbr title="Scalable Vector Graphics">SVG</abbr> attributes are allowed by
default (all others are stripped):</p>
<table class="hlist"><tr><td><ul class="simple">
<li>accent-height</li>
<li>accumulate</li>
<li>additive</li>
<li>alphabetic</li>
<li>arabic-form</li>
<li>ascent</li>
<li>attributeName</li>
<li>attributeType</li>
<li>baseProfile</li>
<li>bbox</li>
<li>begin</li>
<li>by</li>
<li>calcMode</li>
<li>cap-height</li>
<li>class</li>
<li>color</li>
<li>color-rendering</li>
<li>content</li>
<li>cx</li>
<li>cy</li>
<li>d</li>
<li>descent</li>
<li>display</li>
<li>dur</li>
<li>dx</li>
<li>dy</li>
<li>end</li>
<li>fill</li>
<li>fill-opacity</li>
<li>fill-rule</li>
<li>font-family</li>
<li>font-size</li>
<li>font-stretch</li>
<li>font-style</li>
<li>font-variant</li>
<li>font-weight</li>
<li>from</li>
<li>fx</li>
<li>fy</li>
<li>g1</li>
<li>g2</li>
<li>glyph-name</li>
<li>gradientUnits</li>
<li>hanging</li>
<li>height</li>
<li>horiz-adv-x</li>
<li>horiz-origin-x</li>
</ul>
</td><td><ul class="simple">
<li>id</li>
<li>ideographic</li>
<li>k</li>
<li>keyPoints</li>
<li>keySplines</li>
<li>keyTimes</li>
<li>lang</li>
<li>marker-end</li>
<li>marker-mid</li>
<li>marker-start</li>
<li>markerHeight</li>
<li>markerUnits</li>
<li>markerWidth</li>
<li>mathematical</li>
<li>max</li>
<li>min</li>
<li>name</li>
<li>offset</li>
<li>opacity</li>
<li>orient</li>
<li>origin</li>
<li>overline-position</li>
<li>overline-thickness</li>
<li>panose-1</li>
<li>path</li>
<li>pathLength</li>
<li>points</li>
<li>preserveAspectRatio</li>
<li>r</li>
<li>refX</li>
<li>refY</li>
<li>repeatCount</li>
<li>repeatDur</li>
<li>requiredExtensions</li>
<li>requiredFeatures</li>
<li>restart</li>
<li>rotate</li>
<li>rx</li>
<li>ry</li>
<li>slope</li>
<li>stemh</li>
<li>stemv</li>
<li>stop-color</li>
<li>stop-opacity</li>
<li>strikethrough-position</li>
<li>strikethrough-thickness</li>
<li>stroke</li>
</ul>
</td><td><ul class="simple">
<li>stroke-dasharray</li>
<li>stroke-dashoffset</li>
<li>stroke-linecap</li>
<li>stroke-linejoin</li>
<li>stroke-miterlimit</li>
<li>stroke-opacity</li>
<li>stroke-width</li>
<li>systemLanguage</li>
<li>target</li>
<li>text-anchor</li>
<li>to</li>
<li>transform</li>
<li>type</li>
<li>u1</li>
<li>u2</li>
<li>underline-position</li>
<li>underline-thickness</li>
<li>unicode</li>
<li>unicode-range</li>
<li>units-per-em</li>
<li>values</li>
<li>version</li>
<li>viewBox</li>
<li>visibility</li>
<li>width</li>
<li>widths</li>
<li>x</li>
<li>x-height</li>
<li>x1</li>
<li>x2</li>
<li>xlink:actuate</li>
<li>xlink:arcrole</li>
<li>xlink:href</li>
<li>xlink:role</li>
<li>xlink:show</li>
<li>xlink:title</li>
<li>xlink:type</li>
<li>xml:base</li>
<li>xml:lang</li>
<li>xml:space</li>
<li>xmlns</li>
<li>xmlns:xlink</li>
<li>y</li>
<li>y1</li>
<li>y2</li>
<li>zoomAndPan</li>
</ul>
</td></tr></table>
</div>
<div class="section" id="mathml-sanitization">
<span id="advanced-sanitization-mathml"></span><h2><abbr title="Mathematical Markup Language">MathML</abbr> Sanitization<a class="headerlink" href="#mathml-sanitization" title="Permalink to this headline">¶</a></h2>
<p>The following <abbr title="Mathematical Markup Language">MathML</abbr> elements are
allowed by default (all others are stripped):</p>
<table class="hlist"><tr><td><ul class="simple">
<li>annotation</li>
<li>annotation-xml</li>
<li>maction</li>
<li>math</li>
<li>merror</li>
<li>mfenced</li>
<li>mfrac</li>
<li>mi</li>
<li>mmultiscripts</li>
<li>mn</li>
<li>mo</li>
</ul>
</td><td><ul class="simple">
<li>mover</li>
<li>mpadded</li>
<li>mphantom</li>
<li>mprescripts</li>
<li>mroot</li>
<li>mrow</li>
<li>mspace</li>
<li>msqrt</li>
<li>mstyle</li>
<li>msub</li>
</ul>
</td><td><ul class="simple">
<li>msubsup</li>
<li>msup</li>
<li>mtable</li>
<li>mtd</li>
<li>mtext</li>
<li>mtr</li>
<li>munder</li>
<li>munderover</li>
<li>none</li>
<li>semantics</li>
</ul>
</td></tr></table>
<p>The following <abbr title="Mathematical Markup Language">MathML</abbr> attributes are
allowed by default (all others are stripped):</p>
<table class="hlist"><tr><td><ul class="simple">
<li>actiontype</li>
<li>align</li>
<li>close</li>
<li>columnalign</li>
<li>columnlines</li>
<li>columnspacing</li>
<li>columnspan</li>
<li>depth</li>
<li>display</li>
<li>displaystyle</li>
<li>encoding</li>
<li>equalcolumns</li>
<li>equalrows</li>
<li>fence</li>
<li>fontstyle</li>
</ul>
</td><td><ul class="simple">
<li>fontweight</li>
<li>frame</li>
<li>height</li>
<li>linethickness</li>
<li>lspace</li>
<li>mathbackground</li>
<li>mathcolor</li>
<li>mathvariant</li>
<li>maxsize</li>
<li>minsize</li>
<li>open</li>
<li>other</li>
<li>rowalign</li>
<li>rowlines</li>
</ul>
</td><td><ul class="simple">
<li>rowspacing</li>
<li>rowspan</li>
<li>rspace</li>
<li>scriptlevel</li>
<li>selection</li>
<li>separator</li>
<li>separators</li>
<li>stretchy</li>
<li>width</li>
<li>xlink:href</li>
<li>xlink:show</li>
<li>xlink:type</li>
<li>xmlns</li>
<li>xmlns:xlink</li>
</ul>
</td></tr></table>
</div>
<div class="section" id="css-sanitization">
<span id="advanced-sanitization-css"></span><h2><abbr title="Cascading Style Sheets">CSS</abbr> Sanitization<a class="headerlink" href="#css-sanitization" title="Permalink to this headline">¶</a></h2>
<p>The following <abbr title="Cascading Style Sheets">CSS</abbr> properties are allowed by
default in style attributes (all others are stripped):</p>
<table class="hlist"><tr><td><ul class="simple">
<li>azimuth</li>
<li>background-color</li>
<li>border-bottom-color</li>
<li>border-collapse</li>
<li>border-color</li>
<li>border-left-color</li>
<li>border-right-color</li>
<li>border-top-color</li>
<li>clear</li>
<li>color</li>
<li>cursor</li>
<li>direction</li>
<li>display</li>
<li>elevation</li>
<li>float</li>
<li>font</li>
</ul>
</td><td><ul class="simple">
<li>font-family</li>
<li>font-size</li>
<li>font-style</li>
<li>font-variant</li>
<li>font-weight</li>
<li>height</li>
<li>letter-spacing</li>
<li>line-height</li>
<li>overflow</li>
<li>pause</li>
<li>pause-after</li>
<li>pause-before</li>
<li>pitch</li>
<li>pitch-range</li>
<li>richness</li>
</ul>
</td><td><ul class="simple">
<li>speak</li>
<li>speak-header</li>
<li>speak-numeral</li>
<li>speak-punctuation</li>
<li>speech-rate</li>
<li>stress</li>
<li>text-align</li>
<li>text-decoration</li>
<li>text-indent</li>
<li>unicode-bidi</li>
<li>vertical-align</li>
<li>voice-family</li>
<li>volume</li>
<li>white-space</li>
<li>width</li>
</ul>
</td></tr></table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Not all possible CSS values are allowed for these properties.  The
allowable values are restricted by a whitelist and a regular expression that
allows color values and lengths.  <abbr title="Uniform Resource Identifier">URI</abbr>s
are not allowed, to prevent <a class="reference external" href="http://diveintomark.org/archives/2003/06/12/how_to_consume_rss_safely">platypus attacks</a>.
See the _HTMLSanitizer class for more details.</p>
</div>
</div>
<div class="section" id="whitelist-don-t-blacklist">
<h2>Whitelist, Don&#8217;t Blacklist<a class="headerlink" href="#whitelist-don-t-blacklist" title="Permalink to this headline">¶</a></h2>
<p>I am often asked why <strong class="program">Universal Feed Parser</strong> is so hard-assed about
<abbr title="HyperText Markup Language">HTML</abbr> and <abbr title="Cascading Style Sheets">CSS</abbr> sanitizing.  To illustrate the problem, here is an incomplete list of
potentially dangerous <abbr title="HyperText Markup Language">HTML</abbr> tags and
attributes:</p>
<ul class="simple">
<li>script, which can contain malicious script</li>
<li>applet, embed, and object, which can automatically download and execute malicious code</li>
<li>meta, which can contain malicious redirects</li>
<li>onload, onunload, and all other on* attributes, which can contain malicious script</li>
<li>style, link, and the style attribute, which can contain malicious script</li>
</ul>
<p><em>style?</em> Yes, style. <abbr title="Cascading Style Sheets">CSS</abbr> definitions can contain executable code.</p>
<div class="section" id="embedding-javascript-in-css">
<h3>Embedding Javascript in <abbr title="Cascading Style Sheets">CSS</abbr><a class="headerlink" href="#embedding-javascript-in-css" title="Permalink to this headline">¶</a></h3>
<p>This sample is taken from <a class="reference external" href="http://feedparser.org/docs/examples/rss20.xml">http://feedparser.org/docs/examples/rss20.xml</a>:</p>
<div class="highlight-html"><div class="highlight"><pre><span class="nt">&lt;description&gt;</span>Watch out for
<span class="ni">&amp;lt;</span>span style=&quot;background: url(javascript:window.location=&#39;http://example.org/&#39;)&quot;<span class="ni">&amp;gt;</span>
nasty tricks<span class="ni">&amp;lt;</span>/span<span class="ni">&amp;gt;</span><span class="nt">&lt;/description&gt;</span>
</pre></div>
</div>
<p>This sample is more advanced, and does not contain the keyword javascript: that
many naive <abbr title="HyperText Markup Language">HTML</abbr> sanitizers scan for:</p>
<div class="highlight-html"><div class="highlight"><pre><span class="nt">&lt;description&gt;</span>Watch out for
<span class="ni">&amp;lt;</span>span style=&quot;any: expression(window.location=&#39;http://example.org/&#39;)&quot;<span class="ni">&amp;gt;</span>
nasty tricks<span class="ni">&amp;lt;</span>/span<span class="ni">&amp;gt;</span><span class="nt">&lt;/description&gt;</span>
</pre></div>
</div>
<p>Internet Explorer for Windows will execute the Javascript in both of these examples.</p>
<p>Now consider that in <abbr title="HyperText Markup Language">HTML</abbr>, attribute values may be entity-encoded in several different ways.</p>
</div>
<div class="section" id="embedding-encoded-javascript-in-css">
<h3>Embedding encoded Javascript in <abbr title="Cascading Style Sheets">CSS</abbr><a class="headerlink" href="#embedding-encoded-javascript-in-css" title="Permalink to this headline">¶</a></h3>
<p>To a browser, this:</p>
<div class="highlight-html"><div class="highlight"><pre><span class="nt">&lt;span</span> <span class="na">style=</span><span class="s">&quot;any: expression(window.location=&#39;http://example.org/&#39;)&quot;</span><span class="nt">&gt;</span>
</pre></div>
</div>
<p>is the same as this (without the line breaks):</p>
<div class="highlight-html"><div class="highlight"><pre><span class="nt">&lt;span</span> <span class="na">style=</span><span class="s">&quot;&amp;#97;&amp;#110;&amp;#121;&amp;#58;&amp;#32;&amp;#101;&amp;#120;&amp;#112;&amp;#114;&amp;#101;</span>
<span class="s">&amp;#115;&amp;#115;&amp;#105;&amp;#111;&amp;#110;&amp;#40;&amp;#119;&amp;#105;&amp;#110;&amp;#100;&amp;#111;&amp;#119;</span>
<span class="s">&amp;#46;&amp;#108;&amp;#111;&amp;#99;&amp;#97;&amp;#116;&amp;#105;&amp;#111;&amp;#110;&amp;#61;&amp;#39;&amp;#104;</span>
<span class="s">&amp;#116;&amp;#116;&amp;#112;&amp;#58;&amp;#47;&amp;#47;&amp;#101;&amp;#120;&amp;#97;&amp;#109;&amp;#112;&amp;#108;</span>
<span class="s">&amp;#101;&amp;#46;&amp;#111;&amp;#114;&amp;#103;&amp;#47;&amp;#39;&amp;#41;&quot;</span><span class="nt">&gt;</span>
</pre></div>
</div>
<p>which is the same as this (without the line breaks):</p>
<div class="highlight-html"><div class="highlight"><pre><span class="nt">&lt;span</span> <span class="na">style=</span><span class="s">&quot;&amp;#x61;&amp;#x6e;&amp;#x79;&amp;#x3a;&amp;#x20;&amp;#x65;&amp;#x78;&amp;#x70;&amp;#x72;</span>
<span class="s">&amp;#x65;&amp;#x73;&amp;#x73;&amp;#x69;&amp;#x6f;&amp;#x6e;&amp;#x28;&amp;#x77;&amp;#x69;&amp;#x6e;</span>
<span class="s">&amp;#x64;&amp;#x6f;&amp;#x77;&amp;#x2e;&amp;#x6c;&amp;#x6f;&amp;#x63;&amp;#x61;&amp;#x74;&amp;#x69;</span>
<span class="s">&amp;#x6f;&amp;#x6e;&amp;#x3d;&amp;#x27;&amp;#x68;&amp;#x74;&amp;#x74;&amp;#x70;&amp;#x3a;&amp;#x2f;</span>
<span class="s">&amp;#x2f;&amp;#x65;&amp;#x78;&amp;#x61;&amp;#x6d;&amp;#x70;&amp;#x6c;&amp;#x65;&amp;#x2e;&amp;#x6f;</span>
<span class="s">&amp;#x72;&amp;#x67;&amp;#x2f;&amp;#x27;&amp;#x29;&quot;</span><span class="nt">&gt;</span>
</pre></div>
</div>
<p>And so on, plus several other variations, plus every combination of every
variation.</p>
<p>The more I investigate, the more cases I find where Internet Explorer for
Windows will treat seemingly innocuous markup as code and blithely execute it.
This is why <strong class="program">Universal Feed Parser</strong> uses a whitelist and not a
blacklist. I am reasonably confident that none of the elements or attributes on
the whitelist are security risks. I am not at all confident about elements or
attributes that I have not explicitly investigated. And I have no confidence at
all in my ability to detect strings within attribute values that Internet
Explorer for Windows will treat as executable code.</p>
<div class="admonition-see-also admonition seealso">
<p class="first admonition-title">See also</p>
<dl class="last docutils">
<dt><a class="reference external" href="http://diveintomark.org/archives/2003/06/12/how_to_consume_rss_safely">How to consume RSS safely</a></dt>
<dd>Explains the platypus attack.</dd>
</dl>
</div>
</div>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Sanitization</a><ul>
<li><a class="reference internal" href="#html-sanitization"><abbr title="HyperText Markup Language">HTML</abbr> Sanitization</a></li>
<li><a class="reference internal" href="#svg-sanitization"><abbr title="Scalable Vector Graphics">SVG</abbr> Sanitization</a></li>
<li><a class="reference internal" href="#mathml-sanitization"><abbr title="Mathematical Markup Language">MathML</abbr> Sanitization</a></li>
<li><a class="reference internal" href="#css-sanitization"><abbr title="Cascading Style Sheets">CSS</abbr> Sanitization</a></li>
<li><a class="reference internal" href="#whitelist-don-t-blacklist">Whitelist, Don&#8217;t Blacklist</a><ul>
<li><a class="reference internal" href="#embedding-javascript-in-css">Embedding Javascript in <abbr title="Cascading Style Sheets">CSS</abbr></a></li>
<li><a class="reference internal" href="#embedding-encoded-javascript-in-css">Embedding encoded Javascript in <abbr title="Cascading Style Sheets">CSS</abbr></a></li>
</ul>
</li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="date-parsing.html"
                        title="previous chapter">Date Parsing</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="content-normalization.html"
                        title="next chapter">Content Normalization</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="_sources/html-sanitization.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="content-normalization.html" title="Content Normalization"
             >next</a> |</li>
        <li class="right" >
          <a href="date-parsing.html" title="Date Parsing"
             >previous</a> |</li>
        <li><a href="index.html">feedparser 5.1.3 documentation</a> &raquo;</li>
          <li><a href="advanced.html" >Advanced Features</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2004-2008 Mark Pilgrim, 2010-2012 Kurt McKee.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3.
    </div>
  </body>
</html>