<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head profile="http://internetalchemy.org/2003/02/profile"> <link rel="foaf" type="application/rdf+xml" title="FOAF" href="http://www.openlinksw.com/dataspace/uda/about.rdf" /> <link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" /> <meta name="dc.title" content="11. Data Replication, Synchronization and Transformation Services" /> <meta name="dc.subject" content="11. Data Replication, Synchronization and Transformation Services" /> <meta name="dc.creator" content="OpenLink Software Documentation Team ; " /> <meta name="dc.copyright" content="OpenLink Software, 1999 - 2009" /> <link rel="top" href="index.html" title="OpenLink Virtuoso Universal Server: Documentation" /> <link rel="search" href="/doc/adv_search.vspx" title="Search OpenLink Virtuoso Universal Server: Documentation" /> <link rel="parent" href="repl.html" title="Chapter Contents" /> <link rel="prev" href="replexamples.html" title="Transactional Replication Example" /> <link rel="next" href="contents.html" title="Contents" /> <link rel="shortcut icon" href="../images/misc/favicon.ico" type="image/x-icon" /> <link rel="stylesheet" type="text/css" href="doc.css" /> <link rel="stylesheet" type="text/css" href="/doc/translation.css" /> <title>11. Data Replication, Synchronization and Transformation Services</title> <meta http-equiv="Content-Type" content="text/xhtml; charset=UTF-8" /> <meta name="author" content="OpenLink Software Documentation Team ; " /> <meta name="copyright" content="OpenLink Software, 1999 - 2009" /> <meta name="keywords" content="" /> <meta name="GENERATOR" content="OpenLink XSLT Team" /> </head> <body> <div id="header"> <a name="replsample" /> <img src="../images/misc/logo.jpg" alt="" /> <h1>11. Data Replication, Synchronization and Transformation Services</h1> </div> <div id="navbartop"> <div> <a class="link" href="repl.html">Chapter Contents</a> | <a class="link" href="replexamples.html" title="Transactional Replication Example">Prev</a> | <a class="link" href="webappdevelopment.html" title="Web Application Development">Next</a> </div> </div> <div id="currenttoc"> <form method="post" action="/doc/adv_search.vspx"> <div class="search">Keyword Search: <br /> <input type="text" name="q" /> <input type="submit" name="go" value="Go" /> </div> </form> <div> <a href="http://www.openlinksw.com/">www.openlinksw.com</a> </div> <div> <a href="http://docs.openlinksw.com/">docs.openlinksw.com</a> </div> <br /> <div> <a href="index.html">Book Home</a> </div> <br /> <div> <a href="contents.html">Contents</a> </div> <div> <a href="preface.html">Preface</a> </div> <br /> <div class="selected"> <a href="repl.html">Data Replication, Synchronization and Transformation Services</a> </div> <br /> <div> <a href="replintro.html">Introduction</a> </div> <div> <a href="SNAPSHOT.html">Snapshot Replication</a> </div> <div> <a href="proctransrepl.html">Transactional Replication </a> </div> <div> <a href="SCHEDULER.html">Virtuoso scheduler</a> </div> <div> <a href="replexamples.html">Transactional Replication Example</a> </div> <div class="selected"> <a href="replsample.html">Replication Logger Sample</a> <div> <a href="#loggercfg" title="Configuration of the Sample">Configuration of the Sample</a> <a href="#loggersync" title="Synchronization">Synchronization</a> <a href="#runninglogger" title="Running the Sample">Running the Sample</a> <a href="#loggerdynamics" title="Notes on the Sample's Dynamics">Notes on the Sample's Dynamics</a> </div> </div> <br /> </div> <div id="text"> <a name="replsample" /> <h2>11.6. Replication Logger Sample</h2> <p> The logger directory in the samples in the distribution contains a simple load balancing sample. It implements a simplified web site hit log where there is a count of hits maintained per user name and origin IP of each hit. </p> <p> Thus the transaction being replicated between the servers consists of incrementing an IP's hit count and then incrementing a user's hit count. If either IP or user do not have a count, a row is added with a count of 1. The transaction is then logged for replication, so that all servers get all hits, no matter which of the replicating servers processes the hit. </p> <div> <pre class="programlisting"> create table wl_ip_cnt (ic_ip varchar, ic_cnt integer, primary key (ic_ip)); </pre> </div> <div> <pre class="programlisting"> create table wl_user (wu_user varchar, wu_cnt integer, primary key (wu_user)); </pre> </div> <div> <pre class="programlisting"> create procedure wl_hit_repl (in ip varchar, in usr varchar) { set isolation = 'serializable'; update wl_ip_cnt set ic_cnt = ic_cnt + 1 where ic_ip = ip; if (0 = row_count ()) insert into wl_ip_cnt (ic_ip, ic_cnt) values (ip, 1); update wl_user set wu_cnt = wu_cnt + 1 where wu_user = usr; if (0 = row_count ()) insert into wl_user (wu_user, wu_cnt) values (usr, 1); } </pre> </div> <div> <pre class="programlisting"> create procedure wl_hit (in ip varchar, in usr varchar) { wl_hit_repl (ip, usr); repl_text ('hits', 'wl_hit_repl (?, ?)', ip, usr); } </pre> </div> <p> The application client calls wl_hit on one of the mutually replicating servers to log an event. The event's trace will then be propagated to all other servers. The wl_hit_repl function does the actual work. The top level function calls this plus logs the call with its arguments on the local server's hits publication for distribution to other servers. </p> <a name="loggercfg" /> <h3>11.6.1. Configuration of the Sample</h3> <p> The following sequence of calls can be used to define a network of four servers, each replicating every other server. For the sake of example, they are all on localhost and listen at ports 2001 through 2004. </p> <div> <pre class="programlisting"> repl_server ('log1', 'localhost:2001'); repl_server ('log2', 'localhost:2002'); repl_server ('log3', 'localhost:2003'); repl_server ('log4', 'localhost:2004'); </pre> </div> <div> <pre class="programlisting"> repl_publish ('hits', 'hits.log'); </pre> </div> <div> <pre class="programlisting"> repl_subscribe ('log1', 'hits'); repl_subscribe ('log2', 'hits'); repl_subscribe ('log3', 'hits'); repl_subscribe ('log4', 'hits'); </pre> </div> <p> First all the servers are identified. Next the local server declares that it has a publication 'hits'. Next it subscribes to the hits publications of all other servers. In the process it also subscribes to itself, which signals an error and has no other effect. </p> <p> In this way all servers share one configuration. Each server knows which of the servers it is based on the DBName setting in its virtuoso.ini file. </p> <br /> <a name="loggersync" /> <h3>11.6.2. Synchronization</h3> <div> <pre class="programlisting"> create procedure log_sync () { for select SERVER, ACCOUNT from SYS_REPL_ACCOUNTS do { if (SERVER <> repl_this_server ()) { declare err, msg varchar; err := '00000'; exec ('repl_sync (?, ?, ?, ?)', err, msg, vector (SERVER, ACCOUNT, 'dba', 'dba'), 0); } } } </pre> </div> <p> This procedure will go through all subscriptions and request sync for each. Note that the repl_sync function is called inside exec to catch any possible exceptions, as servers may not be available etc. For the sake of simplicity this supplies the literal default dba login 'dba', 'dba' as authentication. </p> <p> The replication sample schedules a call to this function to be made every minute as a background job. if all replication servers are on line and in sync or syncing the function will return without delay or effect. Otherwise it will keep trying until it gets a connection. </p> <br /> <a name="runninglogger" /> <h3>11.6.3. Running the Sample</h3> <p> The logger directory contains various scripts for starting and stopping servers etc. </p> <p> <strong>log_init.sh</strong> - Creates the databases with tables and procedures loaded in the l1, l2, l3 and l4 subdirectories. </p> <p> <strong>log_start.sh</strong> - starts the 4 servers and leaves them running in the background. </p> <p> <strong>log_shut.sh</strong> - Shuts down the 4 test servers. </p> <p> <strong>hits.sh</strong> <hist-per-hour> <no-of-hits> </p> <p> Starts the hits program on each of the 4 servers. The first command line argument gives the test transaction rate for each client and the next gives the duration as a transaction count. </p> <div> <pre class="screen"> hits <dsn> <uid> <pwd> <hits-per-hour> <no-of-hits> </pre> </div> <p> The hits executable repeatedly calls wl_hit with random arguments and collects statistics on call times. If calls complete at a rate faster than the requested rate this periodically sleeps to keep the rate close to the requested rate. It prints statistics every 1000 hits. </p> <br /> <a name="loggerdynamics" /> <h3>11.6.4. Notes on the Sample's Dynamics</h3> <p> When the network initially starts all the publications are at level 0 and in sync. When transactions are fed into the network at a sufficiently slow rate all the servers get to process all transactions in real time. Note that the structure is such that every server does everybody else's work in addition to its own. Thus the insertion rate of the network can't be expected to be higher than that of an individual server. However read load can be spread across servers, so that this type of configuration is effective for balancing query load but not for balancing update load. </p> <p> As we increase the transaction rate at each server we reach a point at which the queue of locally committed but un-replicated transactions grows faster than the other servers will absorb the feed. The servers will each eventually disconnect all synced replication to stop the queue from growing. Once the queue that no longer grows goes empty the subscribers get disconnected. At this point all servers only process their own load without any other distraction. </p> <p> Next each server will notice that it is disconnected from the network and will attempt a resync as a result of the periodic scheduled call to log_sync. Each server will then re-establish a connection to every other server and start resyncing. This will lead to the network being again in sync if the per server transaction rate slows down sufficiently to allow replicators to catch up. If this does not happen the syncing can stay in progress indefinitely, until it either reaches sync or is terminated. </p> <p> Typically a server's capacity for processing local transactions is greater than its capacity for replaying replication feed. This is because one thread is responsible for all replay activity while many threads can process local transactions. </p> <p> The net result of this scheduling policy is that even a heavily replicated network will scale to high peak loads and will automatically return to sync state as soon as the peak is over. If guaranteed transaction level synchronicity must be maintained between servers then the application should not be written using transactional replication but rather with distributed transactions, where each commit makes sure the transaction is fully processed on each participant before returning to the client. This is however up to several times slower and will stop the entire network if a single node fails. </p> <br /> <table border="0" width="90%" id="navbarbottom"> <tr> <td align="left" width="33%"> <a href="replexamples.html" title="Transactional Replication Example">Previous</a> <br />Transactional Replication Example</td> <td align="center" width="34%"> <a href="repl.html">Chapter Contents</a> </td> <td align="right" width="33%"> <a href="webappdevelopment.html" title="Web Application Development">Next</a> <br />Contents of Web Application Development</td> </tr> </table> </div> <div id="footer"> <div>Copyright© 1999 - 2009 OpenLink Software All rights reserved.</div> <div id="validation"> <a href="http://validator.w3.org/check/referer"> <img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" /> </a> <a href="http://jigsaw.w3.org/css-validator/"> <img src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!" height="31" width="88" /> </a> </div> </div> </body> </html>