Sophie

Sophie

distrib > Fedora > 14 > x86_64 > media > updates > by-pkgid > 71d40963b505df4524269198e237b3e3 > files > 860

virtuoso-opensource-doc-6.1.4-2.fc14.noarch.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
 <head profile="http://internetalchemy.org/2003/02/profile">
  <link rel="foaf" type="application/rdf+xml" title="FOAF" href="http://www.openlinksw.com/dataspace/uda/about.rdf" />
  <link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" />
  <meta name="dc.title" content="11. Data Replication, Synchronization and Transformation Services" />
  <meta name="dc.subject" content="11. Data Replication, Synchronization and Transformation Services" />
  <meta name="dc.creator" content="OpenLink Software Documentation Team ;&#10;" />
  <meta name="dc.copyright" content="OpenLink Software, 1999 - 2009" />
  <link rel="top" href="index.html" title="OpenLink Virtuoso Universal Server: Documentation" />
  <link rel="search" href="/doc/adv_search.vspx" title="Search OpenLink Virtuoso Universal Server: Documentation" />
  <link rel="parent" href="repl.html" title="Chapter Contents" />
  <link rel="prev" href="replexamples.html" title="Transactional Replication Example" />
  <link rel="next" href="contents.html" title="Contents" />
  <link rel="shortcut icon" href="../images/misc/favicon.ico" type="image/x-icon" />
  <link rel="stylesheet" type="text/css" href="doc.css" />
  <link rel="stylesheet" type="text/css" href="/doc/translation.css" />
  <title>11. Data Replication, Synchronization and Transformation Services</title>
  <meta http-equiv="Content-Type" content="text/xhtml; charset=UTF-8" />
  <meta name="author" content="OpenLink Software Documentation Team ;&#10;" />
  <meta name="copyright" content="OpenLink Software, 1999 - 2009" />
  <meta name="keywords" content="" />
  <meta name="GENERATOR" content="OpenLink XSLT Team" />
 </head>
 <body>
  <div id="header">
    <a name="replsample" />
    <img src="../images/misc/logo.jpg" alt="" />
    <h1>11. Data Replication, Synchronization and Transformation Services</h1>
  </div>
  <div id="navbartop">
   <div>
      <a class="link" href="repl.html">Chapter Contents</a> | <a class="link" href="replexamples.html" title="Transactional Replication Example">Prev</a> | <a class="link" href="webappdevelopment.html" title="Web Application Development">Next</a>
   </div>
  </div>
  <div id="currenttoc">
   <form method="post" action="/doc/adv_search.vspx">
    <div class="search">Keyword Search: <br />
        <input type="text" name="q" /> <input type="submit" name="go" value="Go" />
    </div>
   </form>
   <div>
      <a href="http://www.openlinksw.com/">www.openlinksw.com</a>
   </div>
   <div>
      <a href="http://docs.openlinksw.com/">docs.openlinksw.com</a>
   </div>
    <br />
   <div>
      <a href="index.html">Book Home</a>
   </div>
    <br />
   <div>
      <a href="contents.html">Contents</a>
   </div>
   <div>
      <a href="preface.html">Preface</a>
   </div>
    <br />
   <div class="selected">
      <a href="repl.html">Data Replication, Synchronization and Transformation Services</a>
   </div>
    <br />
   <div>
      <a href="replintro.html">Introduction</a>
   </div>
   <div>
      <a href="SNAPSHOT.html">Snapshot Replication</a>
   </div>
   <div>
      <a href="proctransrepl.html">Transactional Replication </a>
   </div>
   <div>
      <a href="SCHEDULER.html">Virtuoso scheduler</a>
   </div>
   <div>
      <a href="replexamples.html">Transactional Replication Example</a>
   </div>
   <div class="selected">
      <a href="replsample.html">Replication Logger Sample</a>
    <div>
        <a href="#loggercfg" title="Configuration of the Sample">Configuration of the Sample</a>
        <a href="#loggersync" title="Synchronization">Synchronization</a>
        <a href="#runninglogger" title="Running the Sample">Running the Sample</a>
        <a href="#loggerdynamics" title="Notes on the Sample's Dynamics">Notes on the Sample&#39;s Dynamics</a>
    </div>
   </div>
    <br />
  </div>
  <div id="text">
<a name="replsample" />
    <h2>11.6. Replication Logger Sample</h2>

	<p>
The logger directory in the samples in the distribution contains a simple
load balancing sample.  It implements a simplified web site hit log where there is a
count of hits maintained per user name and origin IP of each hit.
</p>
	<p>
Thus the transaction being replicated between the servers consists of incrementing
an IP&#39;s hit count and then incrementing a user&#39;s hit count.  If either
IP or user do not have a count, a row is added with a count of 1.  The transaction is
then logged for replication, so that all servers get all hits, no matter which of the
replicating servers processes the hit.
</p>
	<div>
      <pre class="programlisting">
create table wl_ip_cnt (ic_ip varchar, ic_cnt integer,
       primary key (ic_ip));
</pre>
    </div>
	<div>
      <pre class="programlisting">
create table wl_user (wu_user varchar, wu_cnt integer,
       primary key (wu_user));
</pre>
    </div>
	<div>
      <pre class="programlisting">
create procedure wl_hit_repl (in ip varchar, in usr varchar)
{
  set isolation = &#39;serializable&#39;;
  update wl_ip_cnt set ic_cnt = ic_cnt + 1 where ic_ip = ip;
  if (0 = row_count ())
    insert into wl_ip_cnt (ic_ip, ic_cnt) values (ip, 1);
  update wl_user set wu_cnt = wu_cnt + 1 where wu_user = usr;
  if (0 = row_count ())
    insert into wl_user (wu_user, wu_cnt) values (usr, 1);
}
</pre>
    </div>
	<div>
      <pre class="programlisting">
create procedure wl_hit (in ip varchar, in usr varchar)
{
  wl_hit_repl (ip, usr);
  repl_text (&#39;hits&#39;, &#39;wl_hit_repl (?, ?)&#39;, ip, usr);
}
</pre>
    </div>
	<p>
The application client calls wl_hit on one of the mutually replicating
servers to log an event.  The event&#39;s trace will then be propagated to all other servers.
The wl_hit_repl function does the actual work. The top level function
calls this plus logs the call with its arguments on the local server&#39;s hits publication
for distribution to other servers.
</p>
	
	<a name="loggercfg" />
    <h3>11.6.1. Configuration of the Sample</h3>
	<p>
The following sequence of calls can be used to define a network
of four servers, each replicating every other server. For the sake of example,
they are all on localhost and listen at ports 2001 through 2004.
</p>
	<div>
      <pre class="programlisting">
repl_server (&#39;log1&#39;, &#39;localhost:2001&#39;);
repl_server (&#39;log2&#39;, &#39;localhost:2002&#39;);
repl_server (&#39;log3&#39;, &#39;localhost:2003&#39;);
repl_server (&#39;log4&#39;, &#39;localhost:2004&#39;);
</pre>
    </div>
	<div>
      <pre class="programlisting">
repl_publish (&#39;hits&#39;, &#39;hits.log&#39;);
</pre>
    </div>
	<div>
      <pre class="programlisting">
repl_subscribe (&#39;log1&#39;, &#39;hits&#39;);
repl_subscribe (&#39;log2&#39;, &#39;hits&#39;);
repl_subscribe (&#39;log3&#39;, &#39;hits&#39;);
repl_subscribe (&#39;log4&#39;, &#39;hits&#39;);
</pre>
    </div>
	<p>
First all the servers are identified.  Next the local server declares that it has
a publication &#39;hits&#39;.  Next it subscribes to the hits publications of
all other servers.  In the process it also subscribes to itself, which signals an
error and has no other effect.
</p>
	<p>
In this way all servers share one configuration.  Each server knows which of the
servers it is based on the DBName setting in its virtuoso.ini file.
</p>
<br />

	
	<a name="loggersync" />
    <h3>11.6.2. Synchronization</h3>

	<div>
      <pre class="programlisting">
create procedure log_sync ()
{
  for select SERVER, ACCOUNT from SYS_REPL_ACCOUNTS do
    {
      if (SERVER &lt;&gt; repl_this_server ())
	{
	  declare err, msg varchar;
	  err := &#39;00000&#39;;
	  exec (&#39;repl_sync (?, ?, ?, ?)&#39;, err, msg, vector (SERVER, ACCOUNT, &#39;dba&#39;, &#39;dba&#39;), 0);
	}
    }
}
</pre>
    </div>
	<p>
This procedure will go through all subscriptions and request sync for each.
Note that the repl_sync function is called inside exec to catch any possible
exceptions, as servers may not be available etc.  For the sake of simplicity
this supplies the literal default dba login &#39;dba&#39;, &#39;dba&#39; as authentication.
</p>
	<p>
The replication sample schedules a call to this function to be made every minute
as a background job.  if all replication servers are on line and
in sync or syncing the function will return without delay or effect. Otherwise
it will keep trying until it gets a connection.
</p>
<br />

	
	<a name="runninglogger" />
    <h3>11.6.3. Running the Sample</h3>

	<p>
The logger directory contains various scripts for starting and stopping
servers etc.
</p>
	<p>
<strong>log_init.sh</strong>	- Creates the databases with tables and procedures loaded in the
 l1, l2, l3 and  l4 subdirectories.
</p>
	<p>
<strong>log_start.sh</strong>	- starts the 4 servers and leaves them running in the background.
</p>
	<p>
<strong>log_shut.sh</strong>	- Shuts down the 4 test servers.
</p>
	<p>
<strong>hits.sh</strong>	&lt;hist-per-hour&gt; &lt;no-of-hits&gt;
</p>
	<p>
Starts the hits program on each of the 4 servers.  The first
  command line argument gives the test transaction rate for each client and the next
gives the duration as a transaction count.
</p>

	<div>
      <pre class="screen">
hits &lt;dsn&gt; &lt;uid&gt; &lt;pwd&gt; &lt;hits-per-hour&gt; &lt;no-of-hits&gt;
</pre>
    </div>

	<p>
The hits executable repeatedly calls wl_hit with random arguments and
collects statistics on call times. If calls complete at a rate faster
than the requested rate this periodically sleeps to keep the rate
close to the requested rate.  It prints statistics every 1000 hits.
</p>
<br />

	
	<a name="loggerdynamics" />
    <h3>11.6.4. Notes on the Sample&#39;s Dynamics</h3>
	<p>
When the network initially starts all the publications are at level 0 and
in sync.  When transactions are fed into the network at a sufficiently slow
rate all the servers get to process all transactions in real time.  Note that the
structure is such that every server does everybody else&#39;s work in addition to its
own. Thus the insertion rate of the network can&#39;t be expected to be higher than
that of an individual server.  However read load can be spread across servers, so
that this type of configuration is effective for balancing query load but not
for balancing update load.
</p>
	<p>
As we increase the transaction rate at each server we reach a point at which
the queue of locally committed but un-replicated transactions grows faster than
the other servers will absorb the feed.  The servers will each eventually disconnect
all synced replication to stop the queue from growing.  Once the queue that no longer
grows goes empty the subscribers get disconnected. At this
point all servers only process their own load without any other distraction.
</p>
	<p>
Next each server will notice that it is disconnected from the network and will
attempt a resync as a result of the periodic scheduled call to log_sync.
Each server will then re-establish a connection to every other server and
start resyncing.  This will lead to the network being again in
sync if the per server transaction rate slows down sufficiently to allow
replicators to catch up.  If this does not happen the syncing can stay in progress
indefinitely, until it either reaches sync or is terminated.
</p>
	<p>
Typically a server&#39;s capacity for processing local transactions is greater
than its capacity for replaying replication feed.  This is because one thread is
responsible for all replay activity while many threads can process local
transactions.
</p>
	<p>
The net result of this scheduling policy is that even a heavily replicated
network will scale to high peak loads and will automatically return to sync state
as soon as the peak is over.  If guaranteed transaction level synchronicity
must be maintained between servers then the application should not be written
using transactional replication but rather with distributed transactions,
where each commit makes sure the transaction is fully processed on each participant before
returning to the client.  This is however up to several times slower and
will stop the entire network if a single node fails.
</p>
<br />
<table border="0" width="90%" id="navbarbottom">
    <tr>
        <td align="left" width="33%">
          <a href="replexamples.html" title="Transactional Replication Example">Previous</a>
          <br />Transactional Replication Example</td>
     <td align="center" width="34%">
          <a href="repl.html">Chapter Contents</a>
     </td>
        <td align="right" width="33%">
          <a href="webappdevelopment.html" title="Web Application Development">Next</a>
          <br />Contents of Web Application Development</td>
    </tr>
    </table>
  </div>
  <div id="footer">
    <div>Copyright© 1999 - 2009 OpenLink Software All rights reserved.</div>
   <div id="validation">
    <a href="http://validator.w3.org/check/referer">
        <img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" />
    </a>
    <a href="http://jigsaw.w3.org/css-validator/">
        <img src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!" height="31" width="88" />
    </a>
   </div>
  </div>
 </body>
</html>