Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > 1596aa0c95b4ccf7adfa8febc56cc15c > files > 179

webmake-2.4-2mdk.noarch.rpm

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
  <head>
    <title>
      WebMake: Documentation: Scraped Templates
    </title>
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
    <meta name="generator" content="WebMake/2.3" />
    <style type="text/css">
      body {
       background-color: #ffffff; 
       color: #000000; 
       line-height: 110%;
       margin-left: 10px;
       margin-right: 10px;
      }
      p, table, td, th {
       font-family: verdana,lucida,helvetica,sans-serif;
       font-size: 11px;
       line-height: 110%;
      }
      pre {
       margin-left: 3%;
       white-space: pre;
      }
      code, samp, pre, p pre {
       font-family: "lucida console", "Courier New", courier, "fixed-width", monospace;
       font-weight: bold;
      }
      H1 {
       font-size: 150%; font-family: Garamond, "Book Antiqua",Times,serif;
       background: #FFCC66; text-align: center;
       padding: 0.5em 1em 0.5em 1em; border-width: 1px;
       border-color: black; border-style: solid; line-height: 120%;
      }
      H2 {
       font-size: 125%; font-family: Garamond, "Book Antiqua",Times,serif;
       background: #FFDD77; text-align: center;
       padding: 0.5em 1em 0.5em 1em; border-width: 1px;
       border-color: black; border-style: solid; line-height: 100%;
      }
      H3 {
       font-size: 100%; font-family: Garamond, "Book Antiqua",Times,serif;
       background: #FFEE88; text-align: center;
       padding: 0.5em 1em 0.5em 1em; border-width: 1px;
       border-color: black; border-style: solid;
      }
      H4 { font-size: 75%; font-family: Garamond, "Book Antiqua",Times,serif; }
      H5 { font-size: 50%; font-family: Garamond, "Book Antiqua",Times,serif; }
      H6 { font-size: 25%; font-family: Garamond, "Book Antiqua",Times,serif; }
      A:link {
       font-weight: bold;
       color: #004000;
       text-decoration: underline; 
      }
      A:visited {
       font-weight: bold;
       color: #008000;
       text-decoration: underline; 
      }
      A:active {
       font-weight: bold;
       color: #800000;
       text-decoration: underline; 
      }
      dt {
       font-size: medium;
       font-weight: bold;
       padding-top: 8px; padding-bottom: 8px;
      }
      dd {
       padding-top: 8px; padding-bottom: 8px;
      }
    </style>
  </head>
  <body bgcolor="#ffffff" text="#000000" link="#3300cc" vlink="#660066">
    <!-- font tag for compat with non-CSS browsers -->
    <font face="lucida,verdana,sans-serif">
      <div align="center">
         <img src="images/WebMakeTitle.png" alt="WebMake" width="500" height="122" />
      </div>
      <table width="100%">
        <tr>
          <td valign="top">
             <strong><a href="http://webmake.taint.org/">WebMake</a>
             Documentation</strong> (version 2.3)
             
          </td>
          <td valign="top">
            <div align="right">
              
               [ <a href="globs.html">Back</a> | <a href="index_04-var_refs.html">Forward</a> | <a href="index.html">Index</a>
               | <a href="allinone.html">All&nbsp;In&nbsp;One</a> ]
               
            </div>
          </td>
        </tr>
      </table>
<!-- yes, it's that Mozilla black-border code again ;) -->
      <!-- stolen from www.mozilla.org via rc3.org -->
            <table border="0" cellpadding="0" cellspacing="0" width="100%">
        <tr>
          <td bgcolor="#aaaaaa">
            <table border="0" cellspacing="4" cellpadding="4" width="100%">
              <tr>
                <td bgcolor="#ffffff">
                  <table border="0" cellspacing="4" cellpadding="4" width="100%">
                    <tr>
                      <td>
                         <h1>Scraped Templates</h1><p>
                          This is a very neat trick. A common problem with templating systems, such as
                          WebMake, is that they <strong>don't actually help at all</strong> in certain areas.
                          
                        </p>
                        <p>
                          Here's one of the problems. When a HTML Guy edits up a page template, he's
                          typically going to edit <em>an entire page</em>, not just small snippets;
                          he has to see what the overall page looks like, align the items correctly,
                          make sure that font looks OK with that font, that bgcolor with that bgcolor,
                          etc.
                          
                        </p>
                        <p>
                          However, as <strong>Talin</strong> mentions in <a href="http://www.advogato.org/article/350.html#4">this thread on Advogato</a>,
                          there's a problem: most large web sites use the notion of "components" -
                          that is, re-usable fragments of dynamic HTML which are assembled to form a
                          complete page.
                          
                        </p>
                        <p>
                          So once the HTML Guy has designed up a good-looking, nice page to display "a
                          list of top 10 selling movies on a site that sells VHS tapes", as the example
                          in the Advo article suggests, the page now contains the following templates:
                          
                        </p>
                        <ul>
                          <li>
                            overall page template
                            
                          </li>
                          <li>
                            top-10 page content
                            
                          </li>
                          <li>
                            top-10 list table template
                            
                          </li>
                          <li>
                            one-row-of-the-table template (which could in turn be broken down
                            into 2 templates: one for odd rows, one for even, etc.)
                            
                          </li>
                        </ul>
                        <p>
                          So someone has to go and cut up the page the HTML Guy has created, into
                          components (template and content items, in WebMake terminology). What a pain.
                          
                        </p>
                        <p>
                          How do we deal with this problem?
                          
                        </p>
                        <a name="Scraping" id="Scraping"><h2>Scraping</h2></a><p>
                          WebMake has some features which help here:
                          
                        </p>
                        <ul>
                          <li>
                            <strong>Content "src" attribute</strong>: templates can be loaded from a named
                            file (or even a remote webpage). Multiple templates or content
                            items can be loaded from the same file.
                            
                          </li>
                          <li>
                            <strong>Pre-processing</strong>: Using the <em>preproc</em> attribute, you can specify
                            a block of perl code to execute over each content item's text.
                            
                          </li>
                          <li>
                            <strong>Scraping</strong>: The <code>scrape_xml()</code> and <code>scrape_out_xml()</code> perl code
                            library functions allows you to easily cut out the bits of the page you
                            want, based on patterns in the page text or HTML.
                            
                          </li>
                        </ul>
                        <p>
                          What you need to do is isolate -- or specify to the HTML Guy -- some patterns
                          in the text that delimit the areas of the page, which you will be turning
                          into templates. You then set up WebMake commands which will scrape the
                          templates from the designer-provided page.
                          
                        </p>
                        <p>
                          Let's go with the 'top-10 videos on VHS' list page example from the Advogato
                          thread. That contains the following templates:
                          
                        </p>
                        <ul>
                          <li>
                            overall page template
                            
                          </li>
                          <li>
                            top-10 page content (text, images maybe etc.)
                            
                          </li>
                          <li>
                            top-10 list table template
                            
                          </li>
                          <li>
                            one-row-of-the-table template (which could in turn be broken down
                            into 2 templates: one for odd rows, one for even, etc.)
                            
                          </li>
                        </ul>
                        <p>
                          Let's say the designer has provided you with this page, called "top10.htm"
                          (hopefully he's filled in the ... bits, of course!):
                          
                        </p>
                        <p>
                          <!--etsafe-->
                          <pre>

    &lt;html&gt;
      &lt;head&gt;
      &lt;title&gt;Top 10 Movies on VHS&lt;/title&gt;
      &lt;/head&gt;&lt;body&gt;

      .... blah blah navigation, other generic-page-template stuff ...

      &lt;!-- start of top-10 page content --&gt;

      Lorem ipsum dolor sit amet, consectetaur adipisicing elit, sed do
      eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
      ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
      aliquip ex ea commodo consequat. ...

      &lt;!-- start of top-10 table --&gt;
      &lt;table bgcolor=nice etc.&gt;

	&lt;!-- start of even row --&gt;
	&lt;tr&gt;
	  &lt;td&gt;....&lt;/td&gt; &lt;td&gt;....&lt;/td&gt; &lt;td&gt;....&lt;/td&gt;
	&lt;/tr&gt;
	&lt;!-- end of even row --&gt;

	&lt;!-- start of odd row --&gt;
	&lt;tr&gt;
	  &lt;td&gt;....&lt;/td&gt; &lt;td&gt;....&lt;/td&gt; &lt;td&gt;....&lt;/td&gt;
	&lt;/tr&gt;
	&lt;!-- end of odd row --&gt;

      &lt;/table&gt;
      &lt;!-- end of top-10 table --&gt;

      &lt;!-- end of top-10 page content --&gt;

      .... blah blah more generic-page-template stuff ....
      &lt;/body&gt;
    &lt;/html&gt;

</pre><!--/etsafe-->
                          
                        </p>
                        <p>
                          We can see that the following content or template items can be scraped
                          out:
                          
                        </p>
                        <ul>
                          <li>
                            overall page template: everything between the <code>html</code> tags, but with
                            text from <code>start of top-10 page content</code> to <code>end of top-10 page
                            content</code> stripped out
                            
                          </li>
                          <li>
                            top-10 page content: <code>start of top-10 page content</code> to <code>end of
                            top-10 page content</code>, strip out <code>top-10 table</code> section
                            
                          </li>
                          <li>
                            top-10 list template: <code>top-10 table</code>, strip out <code>even row</code>
                            and <code>odd row</code> sections
                            
                          </li>
                          <li>
                            even-table-row template: <code>even row</code>
                          </li>
                          <li>
                            odd-table-row template: <code>odd row</code>
                          </li>
                        </ul>
                        <p>
                          That translates into this WebMake code:
                          
                        </p>
                        <p>
                          <!--etsafe-->
                          <pre>
  &lt;{perl        # define the scraping functions we will use.

  sub scrape_page_template {
    return scrape_out_xml (shift
        qr/start of top-10 page content/i, qr/end of top-10 page content/i);
  }

  sub scrape_top10_content {
    my &#36;text = scrape_xml (shift,
        qr/start of top-10 page content/i, qr/end of top-10 page content/i);
    return scrape_out_xml (&#36;text,
        qr/start of top-10 table/i, qr/end of top-10 table/i);
  }

  sub scrape_top10_list_template {
    my &#36;text = scrape_xml (shift,
        qr/start of top-10 table/i, qr/end of top-10 table/i);
    &#36;text = scrape_out_xml (&#36;text,
        qr/start of even row/i, qr/end of even row/i);
    return scrape_out_xml (&#36;text,
        qr/start of odd row/i, qr/end of odd row/i);
  }

  sub scrape_top10_even_row_template {
    return scrape_xml (shift, qr/start of even row/i, qr/end of even row/i);
  }

  sub scrape_top10_odd_row_template {
    return scrape_xml (shift, qr/start of odd row/i, qr/end of odd row/i);
  }

  # (Note the qr// for the search patterns use the 'i' modifier;
  # non-programmers love to mess with capitalisation ;)

  '';           # replace this perl block with an empty string

  }&gt;

  &lt;!-- and now define the templates, using those functions: --&gt;
  &lt;template name="page_template" src="top10.htm"
                          preproc=scrape_page_template&gt;&lt;/template&gt;
  &lt;content name="top10_content" src="top10.htm"
                          preproc=scrape_top10_content&gt;&lt;/content&gt;
  &lt;template name="top10_list_template" src="top10.htm"
                          preproc=scrape_top10_list_template&gt;&lt;/template&gt;
  &lt;template name="top10_even_row_template" src="top10.htm"
                          preproc=scrape_top10_even_row_template&gt;&lt;/template&gt;
  &lt;template name="top10_odd_row_template" src="top10.htm"
                          preproc=scrape_top10_odd_row_template&gt;&lt;/template&gt;

</pre><!--/etsafe-->
                          
                        </p>
                        <p>
                          That's it. Those templates can now be used safely in the site logic,
                          and will work as long as the page designer doesn't muck about with
                          the comments too much.
                          
                        </p>
                        <p>
                          You don't have to use comments, by the way; if your HTML Guy's editor allows
                          him to mark out "zones" of a page in some way, then just use whatever zone
                          markers it provides instead, or even just use patterns in the HTML tags or
                          text.
                          
                        </p>
                      </td>
                    </tr>
                  </table>
                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      <table width="100%">
        <tr>
          <td valign="top">
             <strong><a href="http://webmake.taint.org/">WebMake</a>
             Documentation</strong> (version 2.3)
             
          </td>
          <td valign="top">
            <div align="right">
              
               [ <a href="globs.html">Back</a> | <a href="index_04-var_refs.html">Forward</a> | <a href="index.html">Index</a>
               | <a href="allinone.html">All&nbsp;In&nbsp;One</a> ]
               
            </div>
          </td>
        </tr>
      </table>
      <div align="right">
         <a href="http://webmake.taint.org/"> <img src="images/BuiltWithWebMake.png" alt="Built With WebMake" border="0" width="88" height="31" /></a>
      </div>
    </font>
  </body>
</html>