Sophie: virtuoso-opensource-doc-1:6.1.4-2.fc14 noarch

virtuoso-opensource-doc-6.1.4-2.fc14.noarch.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
 <head profile="http://internetalchemy.org/2003/02/profile">
  <link rel="foaf" type="application/rdf+xml" title="FOAF" href="http://www.openlinksw.com/dataspace/uda/about.rdf" />
  <link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" />
  <meta name="dc.title" content="14. RDF Data Access and Data Management" />
  <meta name="dc.subject" content="14. RDF Data Access and Data Management" />
  <meta name="dc.creator" content="OpenLink Software Documentation Team ;&#10;" />
  <meta name="dc.copyright" content="OpenLink Software, 1999 - 2009" />
  <link rel="top" href="index.html" title="OpenLink Virtuoso Universal Server: Documentation" />
  <link rel="search" href="/doc/adv_search.vspx" title="Search OpenLink Virtuoso Universal Server: Documentation" />
  <link rel="parent" href="rdfandsparql.html" title="Chapter Contents" />
  <link rel="prev" href="rdfsparqlgeospat.html" title="RDF and Geometry" />
  <link rel="next" href="rdfnativestorageproviders.html" title="RDF Data Access Providers (Drivers)" />
  <link rel="shortcut icon" href="../images/misc/favicon.ico" type="image/x-icon" />
  <link rel="stylesheet" type="text/css" href="doc.css" />
  <link rel="stylesheet" type="text/css" href="/doc/translation.css" />
  <title>14. RDF Data Access and Data Management</title>
  <meta http-equiv="Content-Type" content="text/xhtml; charset=UTF-8" />
  <meta name="author" content="OpenLink Software Documentation Team ;&#10;" />
  <meta name="copyright" content="OpenLink Software, 1999 - 2009" />
  <meta name="keywords" content="" />
  <meta name="GENERATOR" content="OpenLink XSLT Team" />
 </head>
 <body>
  <div id="header">
    <a name="rdfperformancetuning" />
    <img src="../images/misc/logo.jpg" alt="" />
    <h1>14. RDF Data Access and Data Management</h1>
  </div>
  <div id="navbartop">
   <div>
      <a class="link" href="rdfandsparql.html">Chapter Contents</a> | <a class="link" href="rdfsparqlgeospat.html" title="RDF and Geometry">Prev</a> | <a class="link" href="rdfnativestorageproviders.html" title="RDF Data Access Providers (Drivers)">Next</a>
   </div>
  </div>
  <div id="currenttoc">
   <form method="post" action="/doc/adv_search.vspx">
    <div class="search">Keyword Search: <br />
        <input type="text" name="q" /> <input type="submit" name="go" value="Go" />
    </div>
   </form>
   <div>
      <a href="http://www.openlinksw.com/">www.openlinksw.com</a>
   </div>
   <div>
      <a href="http://docs.openlinksw.com/">docs.openlinksw.com</a>
   </div>
    <br />
   <div>
      <a href="index.html">Book Home</a>
   </div>
    <br />
   <div>
      <a href="contents.html">Contents</a>
   </div>
   <div>
      <a href="preface.html">Preface</a>
   </div>
    <br />
   <div class="selected">
      <a href="rdfandsparql.html">RDF Data Access and Data Management</a>
   </div>
    <br />
   <div>
      <a href="rdfdatarepresentation.html">Data Representation</a>
   </div>
   <div>
      <a href="rdfsparql.html">SPARQL</a>
   </div>
   <div>
      <a href="sparqlextensions.html">Extensions</a>
   </div>
   <div>
      <a href="rdfgraphsecurity.html">RDF Graphs Security</a>
   </div>
   <div>
      <a href="rdfviews.html">Linked Data Views over RDBMS Data Source</a>
   </div>
   <div>
      <a href="rdfrdfviewgnr.html">Automated Generation of RDF Views over Relational Data Sources</a>
   </div>
   <div>
      <a href="rdfviewsenterpr.html">Examples of Linked Data Views</a>
   </div>
   <div>
      <a href="rdfinsertmethods.html">RDF Insert Methods in Virtuoso</a>
   </div>
   <div>
      <a href="virtuososponger.html">RDFizer Middleware (Sponger)</a>
   </div>
   <div>
      <a href="virtuosospongerfacetinstall.html">Virtuoso Faceted Browser Installation and configuration</a>
   </div>
   <div>
      <a href="virtuosospongerfacent.html">Virtuoso Faceted Web Service</a>
   </div>
   <div>
      <a href="rdfiridereferencing.html">Linked Data</a>
   </div>
   <div>
      <a href="rdfsparqlrule.html">Inference Rules &amp; Reasoning</a>
   </div>
   <div>
      <a href="rdfsparqlgeospat.html">RDF and Geometry</a>
   </div>
   <div class="selected">
      <a href="rdfperformancetuning.html">RDF Performance Tuning</a>
    <div>
        <a href="#rdfperfgeneral" title="General">General</a>
        <a href="#rdfperfrdfscheme" title="RDF Index Scheme">RDF Index Scheme</a>
        <a href="#rdfperfindexes" title="Index Scheme Selection">Index Scheme Selection</a>
        <a href="#rdfperfcost" title="Erroneous Cost Estimates and Explicit Join Order">Erroneous Cost Estimates and Explicit Join Order</a>
        <a href="#rdfperfgetallgraphs" title="Get All Graphs">Get All Graphs</a>
        <a href="#rdfperfdumpandreloadgraphs" title="Dump and Reload Graphs">Dump and Reload Graphs</a>
        <a href="#rdfperfdumpandreloadgraphsn3" title="Dump RDF View Graph to n3">Dump RDF View Graph to n3</a>
        <a href="#rdfperfloading" title="Loading RDF">Loading RDF</a>
        <a href="#rdfperfsparul" title="Using SPARUL">Using SPARUL</a>
        <a href="#rdfperfgeneraldbpedia" title="DBpedia Benchmark">DBpedia Benchmark</a>
        <a href="#rdfstorebenchmarks" title="RDF Store Benchmarks">RDF Store Benchmarks</a>
        <a href="#fastapproxdiffandpatch" title="Fast Approximate RDF Graph Diff and Patch">Fast Approximate RDF Graph Diff and Patch</a>
        <a href="#rdb2rdftriggers" title="RDB2RDF Triggers">RDB2RDF Triggers</a>
    </div>
   </div>
   <div>
      <a href="rdfnativestorageproviders.html">RDF Data Access Providers (Drivers)</a>
   </div>
   <div>
      <a href="rdfgraphreplication.html">RDF Graph Replication</a>
   </div>
    <br />
  </div>
  <div id="text">
    <a name="rdfperformancetuning" />
    <h2>14.15. RDF Performance Tuning</h2>
<p>For RDF query performance, we have the following possible questions:</p>
<ul>
<li>Is the Virtuoso process properly configured to handle big data sets?</li>
<li>Is the graph always specified?</li>
<li>Are public web service endpoints protected against bad queries?</li>
<li>Are there patterns where only a predicate is given?</li>
<li>Is there a bad query plan because of cost model error?</li>
</ul>
<a name="rdfperfgeneral" />
    <h3>14.15.1. General</h3>
	    <p>When running with large data sets, one should configure the Virtuoso process to use between 2/3 
		to 3/5 of system RAM and to stripe storage on all available disks. 
		See <a href="databaseadmsrv.html#VIRTINI">NumberOfBuffers</a>, 
		<a href="databaseadmsrv.html#VIRTINI">MaxDirtyBuffers</a>, and 
		<a href="databaseadmsrv.html#VIRTINI">Striping</a> INI file parameters. 
		  </p>
<div>
      <pre class="programlisting">
; default installation
NumberOfBuffers          = 2000
MaxDirtyBuffers          = 1200	
</pre>
    </div>		
      <p>Typical sizes for the <strong>NumberOfBuffers</strong> and 
      <strong>MaxDirtyBuffers</strong> (3/4 of NumberOfBuffers) parameters in the Virtuoso 
      configuration file (virtuoso.ini) for various memory sizes are as follows, with each buffer 
      consisting of 8K bytes:      
      </p>	
      <table class="data">
      <caption>Table: 14.15.1.1. recommended NumberOfBUffers and MaxDirtyBuffers</caption>
        
          
          	<tr>
              <th class="data">System RAM</th>
              <th class="data">NumberOfBuffers</th>
              <th class="data">MaxDirtyBuffers</th>
            </tr>
          
          
            <tr>
            	<td class="data">2 GB</td>
        <td class="data">170000</td>
        <td class="data">130000</td>
            </tr>
            <tr>
            	<td class="data">4 GB</td>
        <td class="data">340000</td>
        <td class="data">250000</td>
            </tr>
            <tr>
            	<td class="data">8 GB</td>
        <td class="data">680000</td>
        <td class="data">500000</td>
            </tr>
            <tr>
            	<td class="data">16 GB</td>
        <td class="data">1360000</td>
        <td class="data">1000000</td>
            </tr>
            <tr>
            	<td class="data">32 GB</td>
        <td class="data">2720000</td>
        <td class="data">2000000</td>
            </tr>
            <tr>
            	<td class="data">48 GB</td>
        <td class="data">4000000</td>
        <td class="data">3000000</td>
            </tr>
            <tr>
            	<td class="data">64 GB</td>
        <td class="data">5450000</td>
        <td class="data">4000000</td>
            </tr>
          
        
      </table>
    <br />      
    <br />
    <a name="rdfperfrdfscheme" />
    <h3>14.15.2. RDF Index Scheme</h3>
<p>
Starting with version 6.00.3126 the default RDF index scheme consists
of 2 full indices over RDF quads plus 3 partial indices. This index
scheme is generally adapted to all kinds of workloads, regardless of
whether queries generally specify a graph.
</p>

<p>
Alternate indexing schemes are possible but will not be generally needed.
For upgrading old databases with a different index scheme see the corresponding documentation.
</p>

<p>
The index scheme consists of the following indices:
</p>

<ul>
  <li>
        <strong>PSOG</strong> - primary key</li>
  <li>
        <strong>POGS</strong> - bitmap index for lookups on object value.</li>
  <li>
        <strong>SP</strong> - partial index for cases where only S is specified.</li>
  <li>
        <strong>OP</strong> - partial index for cases where only O is specified.</li>
  <li>
        <strong>GS</strong> - partial index for cases where only G is specified.</li>
</ul>

<p>
This index scheme is created by the following statements:
</p>

<div>
      <pre class="programlisting">
create table DB.DBA.RDF_QUAD (
  G IRI_ID_8,
  S IRI_ID_8,
  P IRI_ID_8,
  O any,
  primary key (P, S, O, G)
  )
alter index RDF_QUAD on DB.DBA.RDF_QUAD partition (S int (0hexffff00));

create distinct no primary key ref bitmap index RDF_QUAD_SP on RDF_QUAD (S, P) partition (S int (0hexffff00));
create bitmap index RDF_QUAD_POGS on RDF_QUAD (P, O, G, S) partition (O varchar (-1, 0hexffff));
create distinct no primary key ref bitmap index RDF_QUAD_GS on RDF_QUAD (G, S) partition (S int (0hexffff00));
create distinct no primary key ref index RDF_QUAD_OP on RDF_QUAD (O, P) partition (O varchar (-1, 0hexffff));
;
</pre>
    </div>


<p>
The idea is to favor queries where the predicate is specified in
triple patterns. The entire quad can be efficiently accessed when P
and either or both S and O are known. This has the advantage of
clustering data by the predicate which improves working set: A page
read from disk will only have entries pertaining to the same
predicate, chances of accessing other entries of the page are thus
higher than if the page held values for arbitrary predicates. For
less frequent cases where only S is known, as in describe, the
distinct P&#39;s of the S are found in the SP index. These SP pairs are
then used for accessing the PSOG index to get the O and G. For cases
where only the G is known, as in dropping a graph, the distinct S&#39;s of
the G are found in the GS index. The P&#39;s of the S are then found in
the SP index. After this the whole quad is found in the PSOG index.
</p>

<p>
The SP, OP and GS indices do not store duplicates. If a S has many
values of the P, there is only one entry. Extries are not deleted
from SP, OP or GS. This does not lead to erroneous results since a
full index, that is either POSG or PSOG is always consulted in order
to know if a quad actually exists. For cases of updating data, most
often a graph is entirely dropped and a substantially similar graph is
inserted in its place. The SP, OP and GS indices get to stay relatively unaffected.
</p>

<p>
Still over time specially if there are frequent updates and values do not repeat between
consecutive states the SP, OP and GS indices will get polluted which may affect performance.
Dropping and recreating the index will then remedy the situation.
</p>

<p>
For cases where this is not practical the index scheme should only have full indices, i.e.
each key holds all columns of the primary key of the quad. This will be the case if the
distinct no primary key ref options are not specified in the create index statement. In
such cases all indices remain in strict sync across deletes.
</p>

<p>
Many RDF workloads have bulk load and read intensive access patterns with few deletes.
The default index scheme is optimized for these. With these situations this scheme offers
significant space savings resulting in better working set. Typically this layout takes
60-70% of the space of a layout with 4 full indices.
</p>
    <br />
    <a name="rdfperfindexes" />
    <h3>14.15.3. Index Scheme Selection</h3>
      <p>The indexes in place on the RDF_QUAD table can greatly affect the performance of 
      	SPARQL queries, as can be determined by running the STATISTICS command on the table as follows:
      </p>	
<div>
      <pre class="programlisting">
SQL&gt; STATISTICS DB.DBA.RDF_QUAD;
Showing SQLStatistics of table(s) &#39;DB.DBA.RDF_QUAD&#39;
TABLE_QUALIFIER  TABLE_OWNER      TABLE_NAME       NON_UNIQUE  INDEX_QUALIFIER  INDEX_NAME       TYPE        SEQ_IN_INDEX  COLUMN_NAME      COLLATION  CARDINALITY  PAGES       FILTER_CONDITION
VARCHAR          VARCHAR          VARCHAR          SMALLINT    VARCHAR          VARCHAR          SMALLINT    SMALLINT    VARCHAR          VARCHAR  INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

DB               DBA              RDF_QUAD         NULL        NULL             NULL             0           NULL        NULL             NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           1           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           2           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           3           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           4           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GS      3           1           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GS      3           2           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_OP      3           1           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_OP      3           2           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           1           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           2           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           3           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           4           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_SP      3           1           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_SP      3           2           P                NULL     NULL        NULL        NULL

15 Rows. -- 24 msec.
SQL&gt; 	
</pre>
    </div>      
<p>With only one index (OGPS) created by default, if the graph is always given, as with one or more 
	<strong>FROM</strong> or <strong>FROM NAMED</strong> clauses, and there are no patterns where 
	only graph and predicate are given, then the default indices should be sufficient. If predicate and 
	graph are given but subject is not, then it is sometimes useful to add:</p>
<div>
      <pre class="programlisting">
CREATE BITMAP INDEX RDF_QUAD_PGOS
  ON DB.DBA.RDF_QUAD (G, P, O, S) 
  PARTITION (O VARCHAR (-1, 0hexffff));
</pre>
    </div>

<div class="note">
      <div class="notetitle">Note:</div>
      <p>Note: If the server version is pre-5.0.7, leave out the partitioning clause.</p>
    </div>

<p>Making the PGOS index can help in some cases even if it is not readily apparent from the queries 
	that one is needed. This is so, for example, if the predicate by itself is selective, i.e. there is 
	a predicate that occurs in only a few triples.</p>
<p>If the graph itself is not given in the queries, then the default index scheme will be unworkable.
For this, the appropriate scheme is:</p>

<div>
      <pre class="programlisting">
CREATE TABLE RDF_QUAD
  ( 
    G IRI_ID_8, 
    S IRI_ID_8, 
    P IRI_ID_8, 
    O ANY, 
    PRIMARY KEY (S, P, O, G)
  )
ALTER INDEX RDF_QUAD 
  ON RDF_QUAD 
  PARTITION (S INT (0hexffff00));

CREATE BITMAP INDEX RDF_QUAD_OPGS
  ON DB.DBA.RDF_QUAD (O, P, G, S) 
  PARTITION (O VARCHAR (-1, 0hexffff));

CREATE BITMAP INDEX RDF_QUAD_POGS
  ON DB.DBA.RDF_QUAD (P, O, G, S)
  PARTITION (O VARCHAR (-1, 0hexffff));

CREATE BITMAP INDEX RDF_QUAD_GPOS 
  ON DB.DBA.RDF_QUAD (G, P, O, S)
  PARTITION (O VARCHAR (-1, 0hexffff));
</pre>
    </div>

<div class="note">
      <div class="notetitle">Note:</div>
      <p>Note: If the server version is pre-5.0.7, leave out the <strong>PARTITION</strong> 
clauses and the <strong>ALTER INDEX</strong> statement.</p>
    </div>

<p>Public web service endpoints are proven to be sources of especially bad queries. While local
application develpers can obtain instructions from database administrator and use ISQL access to
the database in order to tune execution plans, &quot;external&quot; clients do not know details of
configuration and/or lacks appropriate skills. The most common problem is that public endpoints usually
get requests that does not mention the required graph, because that queries were initially written for
use with triple stores. If the web service provides access to a single graph (or to a short list of graphs)
then it is strongly recommended to configure it by adding a row into
<strong>DB.DBA.SYS_SPARQL_HOST</strong>:</p>
<div>
      <pre class="programlisting">
CREATE TABLE &quot;DB&quot;.&quot;DBA&quot;.&quot;SYS_SPARQL_HOST&quot;
(
  &quot;SH_HOST&quot; VARCHAR,         -- host pattern
  &quot;SH_GRAPH_URI&quot; VARCHAR,    -- default graph
  &quot;SH_USER_URI&quot; VARCHAR,     -- reserved for any use in applications
  &quot;SH_DEFINES&quot; LONG VARCHAR, -- list of &#39;define xx:yy &quot;zz&quot; &#39; defines as a string 
  PRIMARY KEY (&quot;SH_HOST&quot;)
);

</pre>
    </div>
<p>You can find detailed descriptions of the table columns <a href="rdfsparql.html#rdfdefaultgraph">here</a>.</p>
<p>The idea is that if the client specifies default graph in
the request or uses named graphs and group graph patterns then he is probably smarter than average and
will provide meaningful queries. If no graph names are specified then the query will benefit from
preset graph because this will give the compiler some more indexes to choose from -- indexes that begin
with G.</p>
<p>Sometimes web service endpoint is used to access data of only one application, not all data in the
system. In that case one may wish to declare a separate storage that consists of only RDF Views made
by that application and define <strong>input:storage</strong> in appropriate row of
<strong>DB.DBA.SYS_SPARQL_HOST</strong>.</p>
<br />

<a name="rdfperfcost" />
    <h3>14.15.4. Erroneous Cost Estimates and Explicit Join Order</h3>
	<p>The selectivity of triple patterns is determined at query compile time from sampling the data.
It is possible that misleading data is produced. To see if the cardinality guesses are generally valid, 
look at the query plan with <a href="fn_explain.html">explain ()</a>.</p>
<p>Below is a sample from the LUBM qualification data set in the Virtuoso distribution.
After running <strong>make test</strong> in <strong>binsrc/test/lubm</strong>, there is a 
loaded database with the data. Start a server in the same directory to see the data.</p>

<div>
      <pre class="programlisting">
SQL&gt; EXPLAIN 
  (&#39;SPARQL 
  PREFIX  ub:  &lt;http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#&gt;
  SELECT *
  FROM &lt;lubm&gt;
  WHERE { ?x  rdf:type  ub:GraduateStudent }
  &#39;);

REPORT
VARCHAR
_______________________________________________________________________________

{

Precode:
      0: $25 &quot;callret&quot; := Call __BOX_FLAGS_TWEAK (&lt;constant (lubm)&gt;, &lt;constant (1)&gt;)
      5: $26 &quot;lubm&quot; := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($25 &quot;callret&quot;)
      12: $27 &quot;callret&quot; := Call __BOX_FLAGS_TWEAK (&lt;constant (http://www.w3.org/1999/02/22-rdf-syntax-ns#type)&gt;, &lt;constant (1)&gt;)
      17: $28 &quot;-ns#type&quot; := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($27 &quot;callret&quot;)
      24: $29 &quot;callret&quot; := Call __BOX_FLAGS_TWEAK (&lt;constant (http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#GraduateStudent)&gt;, &lt;constant (1)&gt;)
      29: $30 &quot;owl#GraduateStudent&quot; := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($29 &quot;callret&quot;)
      36: BReturn 0
from DB.DBA.RDF_QUAD by RDF_QUAD_OGPS    1.9e+03 rows
Key RDF_QUAD_OGPS  ASC ($32 &quot;s-3-1-t0.S&quot;)
&lt;col=415 O = $30 &quot;owl#GraduateStudent&quot;&gt; , &lt;col=412 G = $26 &quot;lubm&quot;&gt; , &lt;col=414 P = $28 &quot;-ns#type&quot;&gt;
row specs: &lt;col=415 O LIKE &lt;constant (T)&gt;&gt;

Current of: &lt;$34 &quot;&lt;DB.DBA.RDF_QUAD s-3-1-t0&gt;&quot; spec 5&gt;

After code:
      0: $35 &quot;x&quot; := Call ID_TO_IRI ($32 &quot;s-3-1-t0.S&quot;)
      5: BReturn 0
Select ($35 &quot;x&quot;, &lt;$34 &quot;&lt;DB.DBA.RDF_QUAD s-3-1-t0&gt;&quot; spec 5&gt;)
}

22 Rows. -- 1 msec.
</pre>
    </div>
<p>This finds the graduate student instances in the LUBM graph. First the query 
	converts the IRI literals to IDs. Then, using a match of OG on OGPS, it finds the 
	IRIs of the graduate students. Then, it converts the IRI ID to return to the string form.</p>
<p>The cardinality estimate of 1.9e+03 rows is on the FROM line.</p>
<p>Doing an EXPLAIN() on the queries will show the cardinality estimates. To drill down further, one can split the query into smaller chunks and see the estimates for these, up to doing it at the triple pattern level.
To indicate a variable that is bound but whose value is not a literal known at compile time, one can use the parameter marker <strong>??</strong>.</p>

<div>
      <pre class="programlisting">
SQL&gt; EXPLAIN 
  (&#39;
      SPARQL 
      DEFINE  sql:table-option &quot;order&quot;  
      PREFIX  ub:  &lt;http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#&gt;
      SELECT *
      FROM &lt;lubm&gt;
      WHERE { ?x  rdf:type  ?? }
  &#39;);
</pre>
    </div>

<p>This will not know the type but will know that a type will be
provided. So instead of guessing 1900 matches, this will guess a
smaller number, which is obviously less precise. Thus literals are generally better.</p>
<p>In some cases, generally to work around an optimization error, one can specify an explicit 
	<strong>JOIN</strong> order. This is done with the <strong>sql:select-option &quot;order&quot;</strong> 
	clause in the SPARQL query prefix.</p>

<div>
      <pre class="programlisting">
SQL&gt; SELECT SPARQL_to_sql_text 
  (&#39;  
      DEFINE sql:select-option &quot;order&quot; 
      PREFIX  ub:  &lt;http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#&gt;
      SELECT *
      FROM &lt;lubm&gt;
      WHERE
        {
          ?x  rdf:type        ub:GraduateStudent                                       .
          ?x  ub:takesCourse  &lt;http://www.Department0.University0.edu/GraduateCourse0&gt;
        }
  &#39;);
</pre>
    </div>

<p>shows the SQL text with the order option at the end.</p>
<p>If an estimate is radically wrong then this should be reported as a bug.</p>
<p>If there is a FROM with a KEY on the next line and no column specs then this is a full table scan. 
	The more columns are specified the less rows will be passed to the next operation in the chain. In 
	the example above, there are three columns whose values are known before  reading the table and 
	these columns are leading columns of the index in use so column specs are:</p>
<div>
      <pre class="programlisting">
&lt;col=415 O = $30 &quot;owl#GraduateStudent&quot;&gt; , 
&lt;col=412 G = $26 &quot;lubm&quot;&gt; , 
&lt;col=414 P = $28 &quot;-ns#type&quot;&gt;
</pre>
    </div>
<div class="note">
      <div class="notetitle">Note:</div>
      <p>Note: A KEY with only a row spec is a full table scan with the row spec applied as a filter.
This is usually not good unless this is specifically intended.</p>
    </div>
<p>If queries are compiled to make full table scans when this is not specifically intended, this should be reported as a bug.
The explain () output and the query text should be included in the report.</p>
<p>Consider:</p>

<div>
      <pre class="programlisting">
SQL&gt; EXPLAIN 
  (&#39;
      SPARQL 
      DEFINE sql:select-option &quot;order, loop&quot; 
      PREFIX  ub:  &lt;http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#&gt;
      SELECT *
      FROM &lt;lubm&gt;
      WHERE
        {
          ?x  ub:takesCourse  ?c                  .
          ?x  rdf:type        ub:GraduateStudent
        }
  &#39;);
</pre>
    </div>

<p>One will see in the output that the first table access is to retrieve
all in the LUBM graph which take some course and then later to check
if this is a graduate student. This is obviously not the preferred
order but the <strong>sql:select-option &quot;order&quot;</strong> forces the optimizer to join
from left to right.</p>
<p>It is very easy to end up with completely unworkable query plans in
this manner but if the optimizer really is in error, then this is the
only way of overriding its preferences. The effect of <strong>sql:select-option</strong> is pervasive, extending inside unions, optionals, subqueries etc within the statement.</p>
<p>We note that if, in the above query, both the course taken by the
student and the type of the student are given, the query compilation
will be, at least for all non-cluster cases, an index intersection.
This is not overridden by the sql:select-option clause since an index
intersection is always a safe guess, regardless of the correctness of
the cardinality guesses of the patterns involved.</p>
<br />
<a name="rdfperfgetallgraphs" />
    <h3>14.15.5. Get All Graphs</h3>
<p>In order to get all graphs URIs, one might use the Virtuoso 
<a href="fn_sparql_select_known_graphs.html">DB.DBA.SPARQL_SELECT_KNOWN_GRAPHS()</a> 
built-in function.</p>
<br />
<a name="rdfperfdumpandreloadgraphs" />
    <h3>14.15.6. Dump and Reload Graphs</h3>
In order to dump all graphs with extension exclude those graphs of a certain type by using a
sparql ask query, you may use the following script sequence:
<div>
      <pre class="programlisting">
CREATE PROCEDURE dump_graphs 
  ( IN  dir               VARCHAR  :=  &#39;dumps&#39;   , 
    IN  file_length_limit INTEGER  :=  1000000000
  )
{
    DECLARE inx INT;
  inx := 1;
    SET ISOLATION = &#39;uncommitted&#39;;
    FOR ( SELECT * 
            FROM ( SPARQL DEFINE input:storage &quot;&quot; 
                   SELECT DISTINCT ?g { GRAPH ?g { ?s ?p ?o } . 
                                        FILTER ( ?g != virtrdf: ) 
                                      } 
                 ) AS sub OPTION ( LOOP )) DO
      {
        dump_one_graph ( &quot;g&quot;, 
                         sprintf (&#39;%s/graph%06d_&#39;, dir, inx), 
                         file_length_limit
                       );
      inx := inx + 1;
    }
}
;


CREATE PROCEDURE dump_one_graph 
  ( IN  srcgraph           VARCHAR  , 
    IN  out_file           VARCHAR  , 
    IN  file_length_limit  INTEGER  := 1000000000
  )
{
    DECLARE  file_name  varchar;
    DECLARE  env, ses      any;
    DECLARE  ses_len, 
             max_ses_len, 
             file_len, 
             file_idx      integer;
    SET ISOLATION = &#39;uncommitted&#39;;
  max_ses_len := 10000000;
  file_len := 0;
  file_idx := 1;
    file_name := sprintf (&#39;%s%06d.ttl&#39;, out_file, file_idx);
    string_to_file ( file_name || &#39;.graph&#39;, 
                     srcgraph, 
                     -2
                   );
    string_to_file ( file_name, 
                     sprintf ( &#39;# Dump of graph &lt;%s&gt;, as of %s\n&#39;, 
                               srcgraph, 
                               CAST (NOW() AS VARCHAR)
                             ), 
                     -2
                   );
  env := vector (dict_new (16000), 0, &#39;&#39;, &#39;&#39;, &#39;&#39;, 0, 0, 0, 0);
  ses := string_output ();
    FOR (SELECT * FROM ( SPARQL DEFINE input:storage &quot;&quot; 
                         SELECT ?s ?p ?o { GRAPH `iri(?:srcgraph)` { ?s ?p ?o } } 
                       ) AS sub OPTION (LOOP)) DO
    {
      http_ttl_triple (env, &quot;s&quot;, &quot;p&quot;, &quot;o&quot;, ses);
      ses_len := length (ses);
        IF (ses_len &gt; max_ses_len)
        {
          file_len := file_len + ses_len;
            IF (file_len &gt; file_length_limit)
            {
              http (&#39; .\n&#39;, ses);
              string_to_file (file_name, ses, -1);
              file_len := 0;
              file_idx := file_idx + 1;
                file_name := sprintf (&#39;%s%06d.ttl&#39;, out_file, file_idx);
                string_to_file ( file_name, 
                                 sprintf ( &#39;# Dump of graph &lt;%s&gt;, as of %s (part %d)\n&#39;, 
                                           srcgraph, 
                                           CAST (NOW() AS VARCHAR), 
                                           file_idx), 
                                 -2
                               );
	      env := vector (dict_new (16000), 0, &#39;&#39;, &#39;&#39;, &#39;&#39;, 0, 0, 0, 0);
            }
            ELSE
            string_to_file (file_name, ses, -1);
          ses := string_output ();
        }
    }
    IF (LENGTH (ses))
    {
      http (&#39; .\n&#39;, ses);
      string_to_file (file_name, ses, -1);
    }
}
;

create procedure load_graphs (in dir varchar := &#39;dumps/&#39;)
{
  declare arr any;
  declare g varchar;

  arr := sys_dirlist (dir, 1);
  log_enable (2, 1);
  foreach (varchar f in arr) do
    {
      if (f like &#39;*.ttl&#39;)
	{
	  declare continue handler for sqlstate &#39;*&#39;
	    {
	      log_message (sprintf (&#39;Error in %s&#39;, f));
	    };
  	  g := file_to_string (dir || &#39;/&#39; || f || &#39;.graph&#39;);
	  DB.DBA.TTLP_MT (file_open (dir || &#39;/&#39; || f), g, g, 255);
	}
    }
  exec (&#39;checkpoint&#39;);
}
;
</pre>
    </div>
   <a name="rdfperfdumpandreloadgraphsexamples" />
    <h4>14.15.6.1. Examples for Dump and Reload Graphs</h4>
SQL&gt;dump_one_graph (&#39;http://daas.openlinksw.com/data#&#39;, &#39;./bbc_data_&#39;, 1000000000);
SQL&gt;dump_one_graph (&#39;http://daas.openlinksw.com/data#&#39;, &#39;./bbc_data_&#39;);
   <br />
<br />
<a name="rdfperfdumpandreloadgraphsn3" />
    <h3>14.15.7. Dump RDF View Graph to n3</h3>
<p>The RDF_QM_TREE_DUMP procedure and its associated procedures below are used 
for dumping one or more RDFView Graphs in a Virtuoso server to a set of 
turtle ttl dataset files in the specified dump directory. The dump generation 
is made as fast as possible by grouping mappings by underlying tables so many 
properties from neighbor database columns can be extracted in one table scan. 
The size of the generated files is limited to 5MB. The dump process creates 
internal stored procedures; their texts are saved in file .dump_procedures.sql in 
the directory of dump files for debugging purposes.
</p>

<p>Note that the dump directory must be included in the DirsAllowed 
parameter of the Virtuoso configuration file (e.g., virtuoso.ini), or the 
server will not be allowed to create nor access the dataset file(s).
</p>

<p>
The <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoader">Virtuoso RDF bulk loader</a> 
scripts can then be used to load the dumped datasets for the RDFView graphs directly into 
a Virtuoso RDF QUAD store.
</p>

<a name="rdfperfdumpandreloadgraphsn3params" />
    <h4>14.15.7.1. Parameters</h4>
<ul>
  <li>in <strong>dest_dir</strong> VARCHAR - dump directory </li>
  <li>in <strong>graph_iri</strong> VARCHAR - IRI of the graph to be dumped; triples from other graphs will be excluded. If NULL, then there&#39;s no restriction by graph.</li>
  <li>in <strong>storage</strong> VARCHAR - IRI of the quad map storage to use. NULL means use default storage.</li>
  <li>in <strong>root</strong> VARCHAR - IRI of the quad map to use, e.g., an IRI of an RDF View (or its part). NULL means use all RDF Views of the storage (and the default mapping as well).</li>    
</ul>
<br />
<a name="rdfperfdumpandreloadgraphsn3code" />
    <h4>14.15.7.2. Procedure Code</h4>
<div>
      <pre class="programlisting">
CREATE PROCEDURE DB.DBA.RDF_QM_TREE_DUMP 
  ( in  dest_dir  VARCHAR, 
    in  graph_iri VARCHAR := NULL, 
    in  storage   VARCHAR := NULL, 
    in  root      VARCHAR := NULL
  )
{
 DECLARE all_qms, 
         grouped_qmvs, 
         launcher_text  ANY;
 DECLARE grp_ctr, 
         qm_ctr, 
         qm_count       INTEGER;
 DECLARE sql_file, 
         launcher_name  VARCHAR;
 IF (NOT (dest_dir LIKE &#39;%/&#39;))
   dest_dir := dest_dir || &#39;/&#39;;
 sql_file := dest_dir || &#39;.dump_procedures.sql&#39;;
 IF (storage IS NULL)
   storage := &#39;http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage&#39;;
 string_to_file (
   sql_file, 
   &#39;-- This file contains procedure created by DB.DBA.RDF_QM_TREE_DUMP() for storage &#39; 
      || COALESCE (storage, &#39;NULL&#39;) 
      || &#39; and root quad map &#39; 
      || COALESCE (root, &#39;NULL&#39;) 
      || &#39;\n\n&#39;, 
   -2);
 all_qms := dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, root), 2);
 grouped_qmvs := DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (all_qms);
 launcher_name := &#39;RDF_QM_TREE_DUMP_BATCH_&#39; || md5 (serialize (graph_iri) || storage || serialize (root));
 launcher_text := string_output ();
 http (&#39;CREATE PROCEDURE DB.DBA.&quot;&#39; || launcher_name || &#39;&quot; (in dest_dir VARCHAR)\n{\n&#39;, launcher_text);
 FOR (grp_ctr := length (grouped_qmvs); grp_ctr &gt; 0; grp_ctr := grp_ctr-2)
   {
     DECLARE tables, qms, proc_text ANY;
     DECLARE group_key, proc_name, dump_prefix, cmt VARCHAR;
     tables := grouped_qmvs [grp_ctr-2];
     qms := grouped_qmvs [grp_ctr-1];
     qm_count := length (qms);
     group_key := md5 (serialize (graph_iri) || storage || serialize (root) || serialize (tables));
     proc_name := &#39;RDF_QM_TREE_DUMP_GRP_&#39; || group_key;
     proc_text := string_output ();
     cmt := sprintf (&#39;%d quad maps on join of&#39;, qm_count);
     FOREACH (VARCHAR t IN tables) DO cmt := cmt || &#39; &#39; || t;
     http (&#39;  --  &#39; || cmt || &#39;\n&#39;, launcher_text);
     http (&#39;  DB.DBA.&quot;&#39; || proc_name || &#39;&quot; (dest_dir);\n&#39;, launcher_text);
     http (&#39;CREATE PROCEDURE DB.DBA.&quot;&#39; || proc_name || &#39;&quot; (in dest_dir VARCHAR)\n&#39;, proc_text);
     http (&#39;{\n&#39;, proc_text);
     http (&#39;  -- &#39; || cmt || &#39;\n&#39;, proc_text);
     http (&#39;  DECLARE ses, env ANY;\n&#39;, proc_text);
     http (&#39;  DECLARE file_ctr, cmt_len INTEGER;\n&#39;, proc_text);
     http (&#39;  file_ctr := 0;\n&#39;, proc_text);
     http (&#39;  dbg_obj_princ (&#39; || WS.WS.STR_SQL_APOS (cmt) || &#39;, &#39;&#39;, file &#39;&#39;, file_ctr);\n&#39;, proc_text);
     http (&#39;  ses := string_output ();\n&#39;, proc_text);
     http (&#39;  http (&#39; || WS.WS.STR_SQL_APOS (&#39;#&#39; || cmt || &#39;\n&#39;) || &#39;, ses);\n&#39;, proc_text);
     http (&#39;  env := VECTOR (dict_new (16000), 0, &#39;&#39;&#39;&#39;, &#39;&#39;&#39;&#39;, &#39;&#39;&#39;&#39;, 0, 0, 0, 0);\n&#39;, proc_text);
     http (&#39;  cmt_len := LENGTH (ses);\n&#39;, proc_text);
     http (&#39;  FOR (SPARQL DEFINE input:storage &lt;&#39; || storage || &#39;&gt;\n&#39;, proc_text);
     http (&#39;    SELECT ?s1, ?p1, ?o1\n&#39;, proc_text);
     IF (graph_iri IS NOT NULL)
       {
         http (&#39;    WHERE { GRAPH &lt;&#39;, proc_text); http_escape (graph_iri, 12, proc_text, 1, 1); http (&#39;&gt; {\n&#39;, proc_text);
       }
     ELSE
       http (&#39;    WHERE { GRAPH ?g1 {\n&#39;, proc_text);
     FOR (qm_ctr := 0; qm_ctr &lt; qm_count; qm_ctr := qm_ctr + 1)
       {
         IF (qm_ctr &gt; 0) http (&#39;            UNION\n&#39;, proc_text);
         http (&#39;            { quad map &lt;&#39; || qms[qm_ctr] || &#39;&gt; { ?s1 ?p1 ?o1 } }\n&#39;, proc_text);
       }
     http (&#39;          } } ) DO {\n&#39;, proc_text);
     http (&#39;      http_ttl_triple (env, &quot;s1&quot;, &quot;p1&quot;, &quot;o1&quot;, ses);\n&#39;, proc_text);
     http (&#39;      IF (LENGTH (ses) &gt; 5000000)\n&#39;, proc_text);
     http (&#39;        {\n&#39;, proc_text);
     http (&#39;          http (&#39;&#39; .\\n&#39;&#39;, ses);\n&#39;, proc_text);
     http (&#39;          string_to_file (sprintf (&#39;&#39;%s&#39; || group_key || &#39;_%05d.ttl&#39;&#39;, dest_dir, file_ctr), ses, -2);\n&#39;, proc_text);
     http (&#39;          file_ctr := file_ctr + 1;\n&#39;, proc_text);
     http (&#39;          dbg_obj_princ (&#39; || WS.WS.STR_SQL_APOS (cmt) || &#39;, &#39;&#39;, file &#39;&#39;, file_ctr);\n&#39;, proc_text);
     http (&#39;          ses := string_output ();\n&#39;, proc_text);
     http (&#39;          http (&#39; || WS.WS.STR_SQL_APOS (&#39;#&#39; || cmt || &#39;\n&#39;) || &#39;, ses);\n&#39;, proc_text);
     http (&#39;          env := VECTOR (dict_new (16000), 0, &#39;&#39;&#39;&#39;, &#39;&#39;&#39;&#39;, &#39;&#39;&#39;&#39;, 0, 0, 0, 0);\n&#39;, proc_text);
     http (&#39;        }\n&#39;, proc_text);
     http (&#39;    }\n&#39;, proc_text);
     http (&#39;  IF (LENGTH (ses) &gt; cmt_len)\n&#39;, proc_text);
     http (&#39;    {\n&#39;, proc_text);
     http (&#39;      http (&#39;&#39; .\\n&#39;&#39;, ses);\n&#39;, proc_text);
     http (&#39;      string_to_file (sprintf (&#39;&#39;%s&#39; || group_key || &#39;_%05d.ttl&#39;&#39;, dest_dir, file_ctr), ses, -2);\n&#39;, proc_text);
     http (&#39;    }\n&#39;, proc_text);
     http (&#39;}\n&#39;, proc_text);
     proc_text := string_output_string (proc_text);
     string_to_file (sql_file, proc_text || &#39;;\n\n&#39; , -1);
     EXEC (proc_text);
   }
 http (&#39;}\n&#39;, launcher_text);
 launcher_text := string_output_string (launcher_text);
 string_to_file (sql_file, launcher_text || &#39;;\n\n&#39; , -1);
 EXEC (launcher_text);
 CALL (&#39;DB.DBA.&#39; || launcher_name)(dest_dir);
}
;

CREATE FUNCTION DB.DBA.RDF_QM_CONTENT_OF_QM_TREE 
  ( in  graph_iri  VARCHAR := NULL,
    in  storage    VARCHAR := NULL, 
    in  root       VARCHAR := NULL, 
    in  dict       ANY := NULL
  ) returns ANY
{
 DECLARE res, subqms any;
 DECLARE graphiri varchar;
 graphiri := DB.DBA.JSO_SYS_GRAPH();
 IF (storage IS NULL)
   storage := &#39;http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage&#39;;
 DB.DBA.RDF_QM_ASSERT_STORAGE_FLAG (storage, 0);
 IF (dict IS NULL)
   dict := dict_new ();
 IF (root IS NULL)
   {
     subqms := ((SELECT DB.DBA.VECTOR_AGG (sub.&quot;qmiri&quot;)
         FROM (
           SPARQL DEFINE input:storage &quot;&quot;
           SELECT DISTINCT (str(?qm)) AS ?qmiri
           WHERE { GRAPH `iri(?:graphiri)` {
                     { `iri(?:storage)` virtrdf:qsUserMaps ?lst .
                       ?lst ?p ?qm .
                       FILTER (0 = bif:strstr (str(?p), str(rdf:_)))
                     } UNION {
                       `iri(?:storage)` virtrdf:qsDefaultMap ?qm .
                     } } } ) AS sub ) );
     FOREACH (varchar qmid IN subqms) DO
       DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, qmid, dict);
     RETURN dict;
   }
 DB.DBA.RDF_QM_ASSERT_JSO_TYPE (root, &#39;http://www.openlinksw.com/schemas/virtrdf#QuadMap&#39;);
 IF (graph_iri IS NOT NULL AND
   EXISTS ((SPARQL DEFINE input:storage &quot;&quot;
       SELECT (1) WHERE {
           GRAPH `iri(?:graphiri)` {
               `iri(?:root)` virtrdf:qmGraphRange-rvrFixedValue ?g .
               FILTER (str (?g) != str(?:graph_iri))
             } } ) ) )
   RETURN dict;
 IF (NOT EXISTS ((SPARQL DEFINE input:storage &quot;&quot;
       SELECT (1) WHERE {
           GRAPH `iri(?:graphiri)` {
               `iri(?:root)` virtrdf:qmMatchingFlags virtrdf:SPART_QM_EMPTY .
             } } ) ) )
   dict_put (dict, root, 1);
 subqms := ((SELECT DB.DBA.VECTOR_AGG (sub.&quot;qmiri&quot;)
     FROM (
       SPARQL DEFINE input:storage &quot;&quot;
       SELECT DISTINCT (str(?qm)) as ?qmiri
       WHERE { GRAPH `iri(?:graphiri)` {
   		`iri(?:root)` virtrdf:qmUserSubMaps ?lst .
               ?lst ?p ?qm .
               FILTER (0 = bif:strstr (str(?p), str(rdf:_)))
             } } ) AS sub ) );
 FOREACH (VARCHAR qmid IN subqms) DO
   DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, qmid, dict);
 RETURN dict;
}
;

CREATE FUNCTION DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (in qms ANY) returns ANY
{
 DECLARE res ANY;
 DECLARE ctr INTEGER;
 DECLARE graphiri VARCHAR;
 graphiri := DB.DBA.JSO_SYS_GRAPH();
 res := dict_new (LENGTH (qms) / 20);
 FOREACH (VARCHAR qmiri IN qms) DO
   {
     DECLARE tbls, acc ANY;
     tbls := ((SELECT DB.DBA.VECTOR_AGG (sub.&quot;tbl&quot;)
         FROM (SELECT subsub.&quot;tbl&quot;
           FROM (
             SPARQL DEFINE input:storage &quot;&quot;
             SELECT DISTINCT ?tbl
             WHERE { GRAPH `iri(?:graphiri)` {
                       { `iri(?:qmiri)` virtrdf:qmTableName ?tbl .
                       } UNION {
                         `iri(?:qmiri)` virtrdf:qmATables ?atbls .
                         ?atbls ?p ?atbl .
                         ?atbl virtrdf:qmvaTableName ?tbl
                       } UNION {
                         `iri(?:qmiri)` ?fldmap ?qmv .
                         ?qmv virtrdf:qmvATables ?atbls .
                         ?atbls ?p ?atbl .
                         ?atbl virtrdf:qmvaTableName ?tbl .
                       } } } ) subsub
           ORDER BY 1 ) AS sub ) );
     acc := dict_get (res, tbls);
     IF (acc IS NULL)
       vectorbld_init (acc);
     vectorbld_acc (acc, qmiri);
     dict_put (res, tbls, acc);
   }
 res := dict_to_vector (res, 2);
 FOR (ctr := LENGTH (res); ctr &gt; 0; ctr := ctr-2)
   {
     DECLARE acc ANY;
     acc := aref_set_0 (res, ctr-1);
     vectorbld_final (acc);
     aset_zap_arg (res, ctr-1, acc);
   }
 RETURN res;
}
;

--test dbg_obj_princ (DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (null), 2)));
--test dbg_obj_princ (dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (null), 2));
--test DB.DBA.RDF_QM_TREE_DUMP (&#39;dump/demo&#39;, null, null, null);
--test DB.DBA.RDF_QM_TREE_DUMP (&#39;dump/tpch&#39;, &#39;http://localhost:8600/tpch&#39;, null, null);	
</pre>
    </div>
<br />	
<br />	
    <a name="rdfperfloading" />
    <h3>14.15.8. Loading RDF</h3>
	<p>There are many functions for loading RDF text, in RDF/XML and Turtle.</p>
	<p>For loading RDF/XML, the best way is to split the data to be loaded into
multiple streams and load these in parallel using <a href="fn_rdf_load_rdfxml.html">RDF_LOAD_RDFXML ()</a>.
To avoid running out of rollback space for large files and in order to have multiple concurrent loads not
interfere with each other, the row autocommit mode should be enabled.</p>
	<p>For example, </p>
<div>
      <pre class="programlisting">
log_enable (2);
-- switch row-by-row autocommit on and logging off for this session
DB.DBA.RDF_LOAD_RDFXML (file_to_string_output (&#39;file.xml&#39;), &#39;base_uri&#39;, &#39;target_graph&#39;);
-- more files here ...
checkpoint;
</pre>
    </div>
<p>Loading a file with text like the above with isql will load the data. Since the transaction
logging is off, make a manual checkpoint at the end to ensure that data is persisted upon server
restart since there is no roll forward log.</p>
<p>If large amounts of data are to be loaded, run multiple such streams in parallel. One may have
for example 6 streams for 4 cores. This means that if up to two threads wait for disk, there is still work
for all cores.</p>
<p>Having substantially more threads than processors or disks is not particularly useful.</p>
<p>There exist multithreaded load functions which will load one file on multiple threads: 
<a href="rdfsparql.html#rdfapidataimportttlpmt">the DB.DBA.TTLP_MT() function</a> and 
<a href="rdfsparql.html#rdfapidataimportxmlttlpmt">the DB.DBA.RDF_LOAD_RDFXML_MT() function</a>. Experience
shows that loading multiple files on one thread per file is better.</p>
<p>For loading Turtle, some platforms may have a non-reentrant Turtle parser. This means that only
one load may run at a time. One can try this by calling
<a href="rdfsparql.html#rdfapidataimport">ttlp ()</a> from two sessions at the same time.
If these do not execute concurrently, then the best way may be to try
<a href="rdfsparql.html#rdfapidataimport">ttlp_mt</a> and see if this runs faster than
a single threaded ttlp call.</p>
       <a name="rdfperfloadingutility" />
    <h4>14.15.8.1. RDF Bulk Load Utility</h4>
         <p>The RDF loader utility facilitates parallel bulk loading of multiple RDF files. The utility 
         	maintains a database table containing a list of files to load and the status of each file, 
         	whether not loaded, loaded or loaded with error. The table also records load start and 
         	end times.</p>
         <p>One must have a dba group login for using this and the virtuoso.ini file access 
         	control list must be set up so that the Virtuoso server can open  the files to load.</p>
         <p>Files are added to the load list with the function <a href="fn_ld_dir.html">ld_dir</a>:</p>
<div>
      <pre class="programlisting">
ld_dir (in dir_path varchar, in file_mask varchar, in target_graph varchar);
</pre>
    </div>
         <p>The file mask is a SQL like pattern to match against the files in the directory. 
         	For example:</p>
<div>
      <pre class="programlisting">
ld_dir (&#39;/data8/2848260&#39;, &#39;%.gz&#39;, &#39;http://bsbm.org&#39;);
</pre>
    </div>
         <p>would load the RDF in all files ending in .gz from the directory given as first parameter. 
         	The RDF would be loaded in the http://bsbm.org graph.</p>
         <p>If NULL is given for the graph, each file may go to a different graph specified in a 
         	separate file with the name of the RDF source file plus the extension .graph.</p>
         <p>A .graph file contains the target graph URI without any other content or whitespace.</p>  
         <p>The layout of the load_list table is as follows:</p>
<div>
      <pre class="programlisting">
create table DB.DBA.LOAD_LIST (
  ll_file varchar,
  ll_graph varchar,
  ll_state int default 0, -- 0 not started, 1 going, 2 done
  ll_started datetime,
  ll_done datetime,
  ll_host int,
  ll_work_time integer,
  ll_error varchar,
  primary key (ll_file))
alter index LOAD_LIST on DB.DBA.LOAD_LIST partition (ll_file varchar)
create index LL_STATE on DB.DBA.LOAD_LIST (ll_state, ll_file, ll_graph) partition (ll_state int)
;
</pre>
    </div>
         <p>This table may be checked at any time during bulk load for the progress of the load. 
         	ll_state is 1 for files being loaded and 2 for files whose loading has finished. 
         	ll_error is NULL if the load finished without error, else it is the error message.</p>
         <p>In order to load data from the files in load_list, run as dba:</p>
<div>
      <pre class="programlisting">
DB.DBA.rdf_loader_run (); 	
</pre>
    </div>   
         <p>One may run several of these commands on parallel sessions for better throughput.</p>
         <p>On a cluster one can do:</p>
<div>
      <pre class="programlisting">
cl_exec (&#39;rdf_ld_srv ()&#39;); 
</pre>
    </div>
         <p>This will start one rdf_loader_run on each node of the cluster. Note that in such a 
         	setting all the server processes must see the same files at the same path.</p>
         <p>On an isql session one may execute rdf_loader_run () &amp; several times, forking a new 
         	isql for each such command, similarly to what a Unix shell does.</p> 
         <p>Because this load is non-transactional and non-logged, one must do an explicit checkpoint 
         	after the load to guarantee a persistent state.</p>
         <p>On a single server do:</p>
<div>
      <pre class="programlisting">
checkpoint; 
</pre>
    </div>  
  <p>On a cluster do:</p>
<div>
      <pre class="programlisting">  
cl_exec (&#39;checkpoint&#39;);
</pre>
    </div>
         <p>The server(s) are online and can process queries and transactions while a bulk load 
         	is in progress. Periodic checkpoints may occur during the load but the state is guaranteed 
         	to be consistent only after running a checkpoint after all the bulk load threads 
         	have finished.</p>
         <p>A bulk load should not be forcibly stopped. To make a controlled stop, run:</p>
<div>
      <pre class="programlisting">
rdf_load_stop ();
</pre>
    </div>   
         <p>This will cause the files being loaded at the time to finish load but no new loads 
         	will start until explicitly started with <span class="computeroutput">rdf_loader_run ()</span>.</p>
         <p>Specially note that on a cluster the database will be inconsistent if one server 
         	process does a checkpoint and another does not. Thus guaranteeing a checkpoint on all 
         	is necessary. This is easily done with an isql script with the following content:</p>
<div>
      <pre class="programlisting">
ld_dir (&#39;/data8/2848260&#39;, &#39;%.gz&#39;, &#39;http://bsbm.org&#39;);

-- Record CPU time 
select getrusage ()[0] + getrusage ()[1];

rdf_loader_run () &amp;
rdf_loader_run () &amp;
rdf_loader_run () &amp;
rdf_loader_run () &amp;
rdf_loader_run () &amp;
rdf_loader_run () &amp;
rdf_loader_run () &amp;
rdf_loader_run () &amp;

wait_for_children;
checkpoint;

-- Record CPU time
select getrusage ()[0] + getrusage ()[1];
</pre>
    </div>
         <p>For a cluster, the equivalent is:</p>
<div>
      <pre class="programlisting">
ld_dir (&#39;/data8/2848260&#39;, &#39;%.gz&#39;, &#39;http://bsbm.org&#39;);

cl_exec (&#39;DB.DBA.RDF_LD_SRV (2)&#39;);

cl_exec (&#39;checkpoint&#39;);
</pre>
    </div>
         <p>
      <span class="computeroutput">rdf_loader_run ()</span> recognizes several file types, including .ttl, .nt, .xml, .rdf, 
         	.owl, .nq, .n4, and others. Internally the function uses 
         	<a href="fn_ttlp.html">DB.DBA.ttlp()</a> or 
         	<a href="fn_rdf_load_rdfxml.html">DB.DBA.rdf_load_rdfxml</a>, 
         	as appropriate.</p>
         	<p>See <a href="rdfperformancetuning.html#rdfperfloadinglod">the next section</a> for detailed description 
         	of the <span class="computeroutput">rdf_loader_run ()</span> function.</p>
       <br />
       <a name="rdfperfloadinglod" />
    <h4>14.15.8.2. Loading LOD RDF data</h4>
<p>To load the rdf data to LOD instance, perform the following steps:
</p>
<ul>
  <li>Configure &amp; start cluster</li>
  <li>Execute the file:
<div>
          <pre class="programlisting">
--
--  $Id: rdfandsparql.xml,v 1.95.2.47 2011/10/26 00:21:16 source Exp $
--
--  Alternate RDF index scheme for cases where G unspecified
--
--  This file is part of the OpenLink Software Virtuoso Open-Source (VOS)
--  project.
--
--  Copyright (C) 1998-2009 OpenLink Software
--
--  This project is free software; you can redistribute it and/or modify it
--  under the terms of the GNU General Public License as published by the
--  Free Software Foundation; only version 2 of the License, dated June 1991.
--
--  This program is distributed in the hope that it will be useful, but
--  WITHOUT ANY WARRANTY; without even the implied warranty of
--  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
--  General Public License for more details.
--
--  You should have received a copy of the GNU General Public License along
--  with this program; if not, write to the Free Software Foundation, Inc.,
--  51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
--
--

drop index RDF_QUAD_OGPS;
checkpoint;
create table R2 (G iri_id_8, S iri_id_8, P iri_id_8, O any, primary key (S, P, O, G))
alter index R2 on R2 partition (S int (0hexffff00));

log_enable (2);
insert into R2 (G, S, P, O) SELECT G, S, P, O from rdf_quad;

drop table RDF_QUAD;
alter table r2 rename RDF_QUAD;
checkpoint;
create bitmap index RDF_QUAD_OPGS on RDF_QUAD (O, P, G, S) partition (O varchar (-1, 0hexffff));
create bitmap index RDF_QUAD_POGS on RDF_QUAD (P, O, G, S) partition (O varchar (-1, 0hexffff));
create bitmap index RDF_QUAD_GPOS on RDF_QUAD (G, P, O, S) partition (O varchar (-1, 0hexffff));

checkpoint;
</pre>
        </div>
</li>
  <li>Execute:
<div>
          <pre class="programlisting">
SQL&gt;cl_exec (&#39;checkpoint);
</pre>
        </div>
</li>
  <li>Execute ld_dir (&#39;directory&#39; , &#39;mask&#39; , &#39;graph&#39;), for ex:
<div>
          <pre class="programlisting">
SQL&gt;ld_dir (&#39;/dbs/data&#39;, &#39;*.gz&#39;, &#39;http://dbpedia.org&#39;);
</pre>
        </div>
</li>
  <li>Execute on every node with separate client:
<div>
          <pre class="programlisting">
SQL&gt;rdf_loader_run();
</pre>
        </div>
</li>
</ul>
<br />


       <a name="rdfperfloadingunitpro" />
    <h4>14.15.8.3. Loading UniProt RDF data</h4>
<p>To load the uniprot data, create a function for example such as:</p>
<div>
      <pre class="programlisting">
create function DB.DBA.UNIPROT_LOAD (in log_mode integer := 1)
{
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output(&#39;filename1&#39;),&#39;http://base_uri_1&#39;, &#39;destination_graph_1&#39;, log_mode, 3);
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output(&#39;filename2&#39;),&#39;http://base_uri_2&#39;, &#39;destination_graph_2&#39;, log_mode, 3);
  ...
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output(&#39;filename9&#39;),&#39;http://base_uri_9&#39;, &#39;destination_graph_9&#39;, log_mode, 3);
}
</pre>
    </div>
<p>If you are starting from blank database and you can drop it and re-create in case of error signaled, use it this way:</p>
<div>
      <pre class="programlisting">
checkpoint;
checkpoint_interval(6000);
DB.DBA.UNIPROT_LOAD (0),
checkpoint;
checkpoint_interval(60);
</pre>
    </div>
<p>If the database contains important data already and there&#39;s no way to stop it and backup before the load then use:</p>
<div>
      <pre class="programlisting">
checkpoint;
checkpoint_interval(6000);
DB.DBA.UNIPROT_LOAD (),
checkpoint;
checkpoint_interval(60);
</pre>
    </div>
<p>Note that the &#39;number of threads&#39; parameter of DB.DBA.RDF_LOAD_RDFXML() mentions threads
used to process data from file, an extra thread will read the text and parse it,
so for 4 CPU cores there&#39;s no need in parameter value greater than 3. Three processing
threads per one parsing tread is usually good ratio because parsing is usually three
times faster than the rest of loading so CPU loading is well balanced.
If for example you are using 2 x Quad Xeon, then you can choose between 8
single-threaded parsers or 2 parsers with 3 processing threads each. With 4 cores you may simply load
file after file with 3 processing threads. The most important performance tuning is to set the
[Parameters] section of virtuoso configuration file:</p>
<div>
      <pre class="programlisting">
NumberOfBuffers = 1000000
MaxDirtyBuffers = 800000
MaxCheckpointRemap = 1000000
DefaultIsolation = 2
</pre>
    </div>
<p>Note: these numbers are reasonable for 16 GB RAM Linux box. Usually when there are no such massive operations as loading huge database, you can set up the values as:</p>
<div>
      <pre class="programlisting">
NumberOfBuffers = 1500000
MaxDirtyBuffers = 1200000
MaxCheckpointRemap = 1500000
DefaultIsolation = 2
</pre>
    </div>
<div class="tip">
	<div class="tiptitle">See Also:</div>
    <p>
        <a href="" />
      </p>
    <p>
        <a href="http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtConfigScale#Configuration%20Options">Virtuoso Configuration Options</a>
      </p>    	
</div>
<div class="tip">
	<div class="tiptitle">Tip:</div>
<p>Thus after loading all data you may wish to shutdown, tweak and start server again.
If you have ext2fs or ext3fs filesystem, then it&#39;s better to have enough free space on disk not to
make it more than 80% full.  When it&#39;s almost full it may allocate database file badly, resulting
in measurable loss of disk access speed. That is not Virtuoso-specific fact, but a common hint
for all database-like applications with random access to big files.</p>
</div>
<p>Here is an example of using awk file for splitting big file smaller ones:</p>
<div>
      <pre class="programlisting">
BEGIN {
	file_part=1000
	e_line = &quot;&lt;/rdf:RDF&gt;&quot;
        cur=0
        cur_o=0
	file=0
	part=file_part
      }
	{
	    res_file_i=&quot;res/&quot;FILENAME
	    line=$0
	    s=$1
	    res_file=res_file_i&quot;_&quot;file&quot;.rdf&quot;

	    if (index (s, &quot;&lt;/rdf:Description&gt;&quot;) == 1)
	    {
		cur=cur+1
		part=part-1
	    }

	    if (part &gt; 0)
	    {
	    	print line &gt;&gt; res_file
	    }

	    if (part == 0)
	    {
#		print &quot;===================== &quot; cur
	    	print line &gt;&gt; res_file
		print e_line &gt;&gt; res_file
		close (res_file)
		file=file+1
		part=file_part
	    	res_file=res_file_i&quot;_&quot;file&quot;.rdf&quot;
		system (&quot;cp beg.txt &quot; res_file)
	    }
        }
END { }
</pre>
    </div>
       <br />
       <a name="rdfperfloadingdbpedia" />
    <h4>14.15.8.4. Loading DBPedia RDF data</h4>
<p>You can use the following script as an example for loading DBPedia RDF data in Virtuoso:</p>
<div>
      <pre class="programlisting">
#!/bin/sh

PORT=$1
USER=$2
PASS=$3
file=$4
g=$5
LOGF=`basename $0`.log

if [ -z &quot;$PORT&quot; -o -z &quot;$USER&quot; -o -z &quot;$PASS&quot; -o -z &quot;$file&quot; -o -z &quot;$g&quot; ]
then
  echo &quot;Usage: `basename $0` [DSN] [user] [password] [ttl-file] [graph-iri]&quot;
  exit
fi

if [ ! -f &quot;$file&quot; -a ! -d &quot;$file&quot; ]
then
    echo &quot;$file does not exists&quot;
    exit 1
fi

mkdir READY 2&gt;/dev/null
rm -f $LOGF $LOGF.*

echo &quot;Starting...&quot;
echo &quot;Logging into: $LOGF&quot;

DOSQL ()
{
    isql $PORT $USER $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec=&quot;$1&quot; &gt; $LOGF
}

LOAD_FILE ()
{
    f=$1
    g=$2
    echo &quot;Loading $f (`cat $f | wc -l` lines) `date \&quot;+%H:%M:%S\&quot;`&quot; | tee -a $LOG

    DOSQL &quot;ttlp_mt (file_to_string_output (&#39;$f&#39;), &#39;&#39;, &#39;$g&#39;, 17); checkpoint;&quot; &gt; $LOGF

    if [ $? != 0 ]
    then
	echo &quot;An error occurred, please check $LOGF&quot;
	exit 1
    fi

    line_no=`grep Error $LOGF | awk &#39;{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }&#39;`
    newf=$f.part
    inx=1
    while [ ! -z &quot;$line_no&quot; ]
    do
	cat $f |  awk &quot;BEGIN { i = 1 } { if (i==$line_no) { print \$0; exit; } i = i + 1 }&quot;  &gt;&gt; bad.nt
	line_no=`expr $line_no + 1`
	echo &quot;Retrying from line $line_no&quot;
	echo &quot;@prefix rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt; .&quot; &gt; tmp.nt
	cat $f |  awk &quot;BEGIN { i = 1 } { if (i&gt;=$line_no) print \$0; i = i + 1 }&quot;  &gt;&gt; tmp.nt
	mv tmp.nt $newf
	f=$newf
	mv $LOGF $LOGF.$inx
	DOSQL &quot;ttlp_mt (file_to_string_output (&#39;$f&#39;), &#39;&#39;, &#39;$g&#39;, 17); checkpoint;&quot; &gt; $LOGF

	if [ $? != 0 ]
    then
	    echo &quot;An error occurred, please check $LOGF&quot;
	    exit 1
    fi
	line_no=`grep Error $LOGF | awk &#39;{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }&#39;`
	inx=`expr $inx + 1`
    done
    rm -f $newf 2&gt;/dev/null
    echo &quot;Loaded.  &quot;
}

echo &quot;=======================================&quot;
echo &quot;Loading started.&quot;
echo &quot;=======================================&quot;

if [ -f &quot;$file&quot; ]
then
    LOAD_FILE $file $g
    mv $file READY 2&gt;&gt; /dev/null
elif [ -d &quot;$file&quot; ]
then
    for ff in `find $file -name &#39;*.nt&#39;`
    do
	LOAD_FILE $ff $g
	mv $ff READY 2&gt;&gt; /dev/null
    done
else
   echo &quot;The input is not file or directory&quot;
fi
echo &quot;=======================================&quot;
echo &quot;Final checkpoint.&quot;
DOSQL &quot;checkpoint;&quot; &gt; temp.res
echo &quot;=======================================&quot;
echo &quot;Check bad.nt file for skipped triples.&quot;
echo &quot;=======================================&quot;

exit 0

</pre>
    </div>
       <br />
       <a name="rdfperfloadingbio2rdf" />
    <h4>14.15.8.5. Loading Bio2RDF data</h4>
<p>The shell script below was used to import files in n3 notation into OpenLink Virtuoso RDF storage.</p>
<p>When an syntax error it will cut content from next line and will retry. This was used on ubuntu linux to import bio2rdf and freebase dumps.</p>
<p>Note it uses gawk, so it must be available on system where is tried. Also for recovery additional disk space is needed at max the size of original file.</p>
<div>
      <pre class="programlisting">
#!/bin/bash

PASS=$1
f=$2
g=$3

# Usage
if [ -z &quot;$PASS&quot; -o -z &quot;$f&quot; -o -z &quot;$g&quot; ]
then
  echo &quot;Usage: $0 [password] [ttl-file] [graph-iri]&quot;
  exit
fi

if [ ! -f &quot;$f&quot; ]
then
    echo &quot;$f does not exists&quot;
    exit
fi

# Your port here
PORT=1111  #`inifile -f dbpedia.ini -s Parameters -k ServerPort`
if test -z &quot;$PORT&quot;
then
    echo &quot;Cannot find INI and inifile command&quot;
    exit
fi

# Initial run
isql $PORT dba $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec=&quot;ttlp_mt (file_to_string_output (&#39;$f&#39;), &#39;&#39;, &#39;$g&#39;); checkpoint;&quot; &gt; $0.log

# If disconnect etc.
if [ $? != 0 ]
then
    echo &quot;An error occurred, please check $0.log&quot;
    exit
fi

# Check for error
line_no=`grep Error $0.log | awk &#39;{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }&#39;`
newf=$f.part
inx=1

# Error recovery
while [ ! -z &quot;$line_no&quot; ]
do
    cat $f |  awk &quot;BEGIN { i = 0 } { if (i==$line_no) { print \$0; exit; } i = i + 1 }&quot;  &gt;&gt; bad.nt
    line_no=`expr $line_no + 1`
    echo &quot;Retrying from line $line_no&quot;
    cat $f |  awk &quot;BEGIN { i = 0 } { if (i&gt;=$line_no) print \$0; i = i + 1 }&quot;  &gt; tmp.nt
    mv tmp.nt $newf
    f=$newf
    mv $0.log $0.log.$inx
    # Run the recovered part
    isql $PORT dba $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec=&quot;ttlp_mt (file_to_string_output (&#39;$f&#39;), &#39;&#39;, &#39;$g&#39;); checkpoint;&quot; &gt; $0.log

    if [ $? != 0 ]
    then
	echo &quot;An error occurred, please check $0.log&quot;
	exit
    fi
   line_no=`grep Error $0.log | awk &#39;{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }&#39;`
   inx=`expr $inx + 1`
done
</pre>
    </div>

       <br />
    <br />
    <a name="rdfperfsparul" />
    <h3>14.15.9. Using SPARUL</h3>
	<p>Since SPARUL updates are generally not meant to be transactional, it is
	    best to run these in <a href="fn_log_enable.html">log_enable (2)</a> mode,
	    which commits every operation as it is done. This prevents one from running out of rollback space. Also for bulk updates, transaction logging can be turned off. If so, one should do a manual checkpoint after the operation to ensure persistence across server restart since there is no roll forward log.</p>
<p>To have a roll forward log and row by row autocommit, one may use <a href="fn_log_enable.html">log_enable (3)</a>. This will write constantly into the log which takes extra time. Having no logging and doing a checkpoint when the whole work is finished is faster.</p>
<p>Many SPARUL operations can be run in parallel in this way. If they are independent with respect to their input and output, they can run in parallel and row by row autocommit will ensure they do not end up waiting for each others&#39; locks.</p>
    <br />
    <a name="rdfperfgeneraldbpedia" />
    <h3>14.15.10. DBpedia Benchmark</h3>
<p>We ran the DBpedia benchmark queries again with different configurations of Virtuoso.
Comparing numbers given by different parties is a constant problem. In the case reported here,
we loaded the full DBpedia 3, all languages, with about 198M triples, onto Virtuoso v5 and Virtuoso Cluster
v6, all on the same 4 core 2GHz Xeon with 8G RAM. All databases were striped on 6 disks. The Cluster
configuration was with 4 processes in the same box.
We ran the queries in two variants:
</p>
<ul>
  <li>With graph specified in the SPARQL FROM clause, using the default indices.</li>
  <li>With no graph specified anywhere, using an alternate indexing scheme.</li>
</ul>
<p>The times below are for the sequence of 5 queries.
As there is a query in the set that specifies no condition on S or O and only P,
thus cannot be done with the default indices With Virtuoso v5. With Virtuoso Cluster v6 it can,
because v6 is more space efficient. So we added the index:</p>
<div>
      <pre class="programlisting">
create bitmap index rdf_quad_pogs on rdf_quad (p, o, g, s);
</pre>
    </div>
<table class="data">


<tr>
  <th class="data" />
  <th class="data">Virtuoso v5 with  gspo, ogps, pogs</th>
  <th class="data">Virtuoso Cluster v6 with gspo, ogps</th>
  <th class="data">Virtuoso Cluster v6 with gspo, ogps, pogs</th>
</tr>


<tr>
        <td class="data">cold</td>
        <td class="data">210 s</td>
        <td class="data">136 s</td>
        <td class="data">33.4 s</td>
      </tr>
<tr>
        <td class="data">warm</td>
        <td class="data">0.600 s</td>
        <td class="data">4.01 s</td>
        <td class="data">0.628 s</td>
      </tr>


</table>
    <br />
<p>Now let us do it without a graph being specified. Note that alter index is valid for v6 or higher.
For all platforms, we drop any existing indices, and:</p>
<div>
      <pre class="programlisting">
create table r2 (g iri_id_8, s, iri_id_8, p iri_id_8, o any, primary key (s, p, o, g))
alter index R2 on R2 partition (s int (0hexffff00));

log_enable (2);
insert into r2 (g, s, p, o) SELECT g, s, p, o from rdf_quad;

drop table rdf_quad;
alter table r2 rename RDF_QUAD;
create bitmap index rdf_quad_opgs on rdf_quad (o, p, g, s) partition (o varchar (-1, 0hexffff));
create bitmap index rdf_quad_pogs on rdf_quad (p, o, g, s) partition (o varchar (-1, 0hexffff));
create bitmap index rdf_quad_gpos on rdf_quad (g, p, o, s) partition (o varchar (-1, 0hexffff));
</pre>
    </div>
<p>The code is identical for v5 and v6, except that with v5 we use iri_id (32 bit) for
the type, not iri_id_8 (64 bit). We note that we run out of IDs with v5 around a few billion
triples, so with v6 we have double the ID length and still manage to be vastly more space efficient.</p>
<p>With the above 4 indices, we can query the data pretty much in any combination without hitting
a full scan of any index. We note that all indices that do not begin with s end with s as a bitmap.
This takes about 60% of the space of a non-bitmap index for data such as DBpedia.</p>
<p>If you intend to do completely arbitrary RDF queries in Virtuoso, then chances are
you are best off with the above index scheme.</p>
<table class="data">


<tr>
  <th class="data" />
  <th class="data">Virtuoso v5 with  gspo, ogps, pogs</th>
  <th class="data">Virtuoso Cluster v6 with gspo, ogps, pogs</th>
</tr>


<tr>
        <td class="data">warm</td>
        <td class="data">0.595 s</td>
        <td class="data">0.617 s</td>
      </tr>


</table>
    <br />
<p>The cold times were about the same as above, so not reproduced.</p>
<p>It is in the SPARQL spirit to specify a graph and for pretty much any application,
there are entirely sensible ways of keeping the data in graphs and specifying which ones are
concerned by queries. This is why Virtuoso is set up for this by default.</p>
<p>On the other hand, for the open web scenario, dealing with an unknown large number of graphs,
enumerating graphs is not possible and questions like which graph of which source asserts x become
relevant. We have two distinct use cases which warrant different setups of the database, simple as that.</p>
<p>The latter use case is not really within the SPARQL spec, so implementations may or may not
support this.</p>
<p>Once the indices are right, there is no difference between specifying a graph and not specifying a
graph with the queries considered. With more complex queries, specifying a graph or set of graphs does
allow some optimizations that cannot be done with no graph specified. For example, bitmap intersections
are possible only when all leading key parts are given.</p>
<p>The best warm cache time is with v5; the five queries run under 600 ms after the first go.
This is noted to show that all-in-memory with a single thread of execution is hard to beat.</p>
<p>Cluster v6 performs the same queries in 623 ms. What is gained in parallelism is lost in latency
if all operations complete in microseconds. On the other hand, Cluster v6 leaves v5 in the dust in any
situation that has less than 100% hit rate. This is due to actual benefit from parallelism if operations
take longer than a few microseconds, such as in the case of disk reads. Cluster v6 has substantially
better data layout on disk, as well as fewer pages to load for the same content.</p>
<p>This makes it possible to run the queries without the pogs index on Cluster v6 even when v5 takes prohibitively long.</p>
<p>The purpose is to have a lot of RAM and space-efficient data representation.</p>
<p>For reference, the query texts specifying the graph are below. To run without specifying
the graph, just drop the FROM &lt;http://dbpedia.org&gt; from each query. The returned row counts are
indicated below each query&#39;s text.</p>
<div>
      <pre class="programlisting">
SQL&gt;SPARQL
SELECT ?p ?o
FROM &lt;http://dbpedia.org&gt;
WHERE
  {
    &lt;http://dbpedia.org/resource/Metropolitan_Museum_of_Art&gt; ?p ?o .
  };

p                                                                                 o
VARCHAR                                                                           VARCHAR
_______________________________________________________________________________

http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://umbel.org/umbel/ac/Artifact
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/MuseumsInNewYorkCity
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/ArtMuseumsAndGalleriesInTheUnitedStates
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/Museum103800563
..
-- 335 rows

SQL&gt;SPARQL
PREFIX p: &lt;http://dbpedia.org/property/&gt;
SELECT ?film1 ?actor1 ?film2 ?actor2
FROM &lt;http://dbpedia.org&gt;
WHERE
  {
  ?film1 p:starring &lt;http://dbpedia.org/resource/Kevin_Bacon&gt; .
  ?film1 p:starring ?actor1 .
  ?film2 p:starring ?actor1 .
    ?film2 p:starring ?actor2 .
};

film1                                       actor1                                    film2                                        ctor2
VARCHAR                                     VARCHAR                                   VARCHAR                                      ARCHAR
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Kevin_Bacon
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Meryl_Streep
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Joseph_Mazzello
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/David_Strathairn
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/John_C._Reilly
...
--  23910 rows

SQL&gt;SPARQL
PREFIX p: &lt;http://dbpedia.org/property/&gt;
SELECT ?artist ?artwork ?museum ?director
FROM &lt;http://dbpedia.org&gt;
WHERE
  {
  ?artwork p:artist ?artist .
  ?artwork p:museum ?museum .
    ?museum p:director ?director
  };

artist                                          artwork                                              museum                                                                            director
VARCHAR                                         VARCHAR                                              VARCHAR                                                                           VARCHAR
_______________________________________________

http://dbpedia.org/resource/Paul_C%C3%A9zanne   http://dbpedia.org/resource/The_Basket_of_Apples     http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Paul_Signac         http://dbpedia.org/resource/Neo-impressionism        http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Georges_Seurat      http://dbpedia.org/resource/Neo-impressionism        http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Edward_Hopper       http://dbpedia.org/resource/Nighthawks               http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Mary_Cassatt        http://dbpedia.org/resource/The_Child%27s_Bath       http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
..
-- 303 rows

SQL&gt;SPARQL
PREFIX geo: &lt;http://www.w3.org/2003/01/geo/wgs84_pos#&gt;
PREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;
PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;
SELECT ?s ?homepage
FROM &lt;http://dbpedia.org&gt;
WHERE
  {
   &lt;http://dbpedia.org/resource/Berlin&gt; geo:lat ?berlinLat .
   &lt;http://dbpedia.org/resource/Berlin&gt; geo:long ?berlinLong .
   ?s geo:lat ?lat .
   ?s geo:long ?long .
   ?s foaf:homepage ?homepage .
   FILTER (
     ?lat        &lt;=     ?berlinLat + 0.03190235436 &amp;&amp;
     ?long       &gt;=     ?berlinLong - 0.08679199218 &amp;&amp;
     ?lat        &gt;=     ?berlinLat - 0.03190235436 &amp;&amp;
     ?long       &lt;=     ?berlinLong + 0.08679199218) };

s                                                                                 homepage
VARCHAR                                                                           VARCHAR
_______________________________________________________________________________

http://dbpedia.org/resource/Berlin_University_of_the_Arts                         http://www.udk-berlin.de/
http://dbpedia.org/resource/Berlin_University_of_the_Arts                         http://www.udk-berlin.de/
http://dbpedia.org/resource/Berlin_Zoological_Garden                              http://www.zoo-berlin.de/en.html
http://dbpedia.org/resource/Federal_Ministry_of_the_Interior_%28Germany%29        http://www.bmi.bund.de
http://dbpedia.org/resource/Neues_Schauspielhaus                                  http://www.goya-berlin.com/
http://dbpedia.org/resource/Bauhaus_Archive                                       http://www.bauhaus.de/english/index.htm
http://dbpedia.org/resource/Canisius-Kolleg_Berlin                                http://www.canisius-kolleg.de
http://dbpedia.org/resource/Franz%C3%B6sisches_Gymnasium_Berlin                   http://www.fg-berlin.cidsnet.de
..
-- 48 rows

SQL&gt;SPARQL
PREFIX geo: &lt;http://www.w3.org/2003/01/geo/wgs84_pos#&gt;
PREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;
PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;
PREFIX p: &lt;http://dbpedia.org/property/&gt;
SELECT ?s ?a ?homepage
FROM &lt;http://dbpedia.org&gt;
WHERE
  {
   &lt;http://dbpedia.org/resource/New_York_City&gt; geo:lat ?nyLat .
   &lt;http://dbpedia.org/resource/New_York_City&gt; geo:long ?nyLong .
   ?s geo:lat ?lat .
   ?s geo:long ?long .
   ?s p:architect ?a .
   ?a foaf:homepage ?homepage .
   FILTER (
     ?lat        &lt;=     ?nyLat + 0.3190235436 &amp;&amp;
     ?long       &gt;=     ?nyLong - 0.8679199218 &amp;&amp;
     ?lat        &gt;=     ?nyLat - 0.3190235436 &amp;&amp;
     ?long       &lt;=     ?nyLong + 0.8679199218) };
s                                                                                 a               homepage
VARCHAR                                                                           VARCHAR              VARCHAR
_______________________________________________________________________________

http://dbpedia.org/resource/GE_Building                                           http://dbpedia.org/resource/Associated_Architects              http://www.associated-architects.co.uk
http://dbpedia.org/resource/Giants_Stadium                                        http://dbpedia.org/resource/HNTB              http://www.hntb.com/
http://dbpedia.org/resource/Fort_Tryon_Park_and_the_Cloisters                     http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Central_Park                                          http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Prospect_Park_%28Brooklyn%29                          http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Meadowlands_Stadium                                   http://dbpedia.org/resource/360_Architecture              http://oakland.athletics.mlb.com/oak/ballpark/new/faq.jsp
http://dbpedia.org/resource/Citi_Field                                            http://dbpedia.org/resource/HOK_Sport_Venue_Event              http://www.hoksve.com/
http://dbpedia.org/resource/Citigroup_Center                                      http://dbpedia.org/resource/Hugh_Stubbins_Jr.              http://www.klingstubbins.com
http://dbpedia.org/resource/150_Greenwich_Street                                  http://dbpedia.org/resource/Fumihiko_Maki              http://www.pritzkerprize.com/maki2.htm
http://dbpedia.org/resource/Freedom_Tower                                         http://dbpedia.org/resource/David_Childs              http://www.som.com/content.cfm/www_david_m_childs
http://dbpedia.org/resource/7_World_Trade_Center                                  http://dbpedia.org/resource/David_Childs              http://www.som.com/content.cfm/www_david_m_childs
http://dbpedia.org/resource/The_New_York_Times_Building                           http://dbpedia.org/resource/Renzo_Piano              http://www.rpbw.com/
http://dbpedia.org/resource/Trump_World_Tower                                     http://dbpedia.org/resource/Costas_Kondylis              http://www.kondylis.com

13 Rows. -- 2183 msec.
</pre>
    </div>
    <br />
<a name="rdfstorebenchmarks" />
    <h3>14.15.11. RDF Store Benchmarks</h3>
    <a name="rdfstorebenchmarksintroduction" />
    <h4>14.15.11.1. Introduction</h4>
       <p>In a particular RDF Store Benchmarks there is difference if the queries are
executed with specified graph or with specified multiple graphs. As Virtuoso is quad store,
not triple store with many tables, it runs queries inefficiently if graphs are specified
and there are no additional indexes except pre-set GSPO and OGPS. Proper use of the FROM clause
or adding indexes with graph column will contribute for better results.
       </p>
    <br />
    <a name="rdfstorebenchmarksindexusage" />
    <h4>14.15.11.2. Using bitmap indexes</h4>
    <p>If is known in advance for the current RDF Store Benchmarks that some
users will not indicate specific graphs then should be done: </p>
    <ul>
       <li>either create indexes with graph in last position</li>
       <li>or load everything into single graph and specify it somewhere in querying application.</li>
    </ul>
    <p>Both methods do not require any changes in query texts</p>
<ul>
  <li>For users using Virtuoso 5 is strongly recommended is the usage of additional bitmap indexes:
<div>
          <pre class="programlisting">
SQL&gt; create bitmap index RDF_QUAD_POGS on DB.DBA.RDF_QUAD (P,O,G,S);
SQL&gt; create bitmap index RDF_QUAD_PSOG on DB.DBA.RDF_QUAD (P,S,O,G);
</pre>
        </div>
</li>
  <li>For users using Virtuoso 6 or higher, see the new layout <a href="rdfperformancetuning.html#rdfperfrdfscheme">here</a>.</li>
</ul>
    <p>You can create other indexes as well. Bitmap indexes are preferable, but
if O is the last column, then the index can not be bitmap, so it could be, for e.g.:</p>
<div>
      <pre class="programlisting">
create index RDF_QUAD_PSGO on DB.DBA.RDF_QUAD (P, S, G, O);
</pre>
    </div>
    <p>but cannot be:</p>
<div>
      <pre class="programlisting">
create bitmap index RDF_QUAD_PSGO on DB.DBA.RDF_QUAD (P, S, G, O);
</pre>
    </div>
    <br />
  <br />
  <a name="fastapproxdiffandpatch" />
    <h3>14.15.12. Fast Approximate RDF Graph Diff and Patch</h3>
<p>Two algorithms described below resemble &quot;unified diff&quot; and &quot;patch by unified diff&quot; 
but they work on RDF graphs, not on plain texts.
</p>
<p>They work reasonably for graphs composed from CBDs (concise bounded descriptions) of 
some subjects, if these subjects are either &quot;named&quot; IRIs or can be identified by values of 
their inverse functional properties.
</p>
<p>Many sorts of commonly used graphs match these restrictions, including all graphs 
without blank nodes, most of FOAF files, graphs that can be &quot;pretty-printed&quot; in JSON, most 
of dumps of relational databases etc.
</p>
<p>The basic idea is as simple as zipper: 
</p>
<ul>
  <li>Place one graph at the left and one to the right,</li>
  <li>Find a retainer box at the right and a matching pin at the left,</li>
  <li>Join them</li>
  <li>Pull the slider as long as possible.</li>
  <li>Repeat this while there are pins and boxes that can be matched and sliders that can be moved.</li>
</ul>
<p>An IRI in left graph (say, G1) matches to same IRI in right graph (G2) 
as pin to box. The same is true for literals too.
</p>
<p>Functional and inverse functional properties are teeth that form chains, algorithm &quot;moves sliders&quot; along these chains, incrementally connecting more and more nodes.
</p>
<p>If there is a match of this sort (O1 in G1 matches O2 in G2) and the matched nodes 
are values of same inverse functional property P (there are { S1 P O1 } in 
G1 and { S2 P O2 }  in G2) then we guess that S1 
matches S2.
</p>
<p>If S1 in G1 matches S2 in G2 and the matched 
nodes are subjects of same functional property P ( there are { S1 P N1 } in 
G1 and { S2 P N2 } in G2 ) then we guess that N1 
matches N2, now it&#39;s possible to try same interaction on triples where N1 and 
N2 are in subject position, that&#39;s how slides move. A typical example of a long zipper 
is closed list with matched heads.
</p>
<a name="fastapproxdiffandpatchhow" />
    <h4>14.15.12.1. Make a Diff And Use It</h4>
<ul>
  <li>Using <a href="fn_rdf_graph_diff.html">DB.DBA.RDF_GRAPH_DIFF</a>
      </li>
  <li>Using <a href="fn_rdf_suo_diff_ttl.html">DB.DBA.RDF_SUO_DIFF_TTL</a>
      </li>
  <li>Using <a href="fn_rdf_suo_apply_patch.html">DB.DBA.RDF_SUO_APPLY_PATCH</a>
      </li>  
</ul>
<br />
<a name="fastapproxdiffandpatchclt" />
    <h4>14.15.12.2. Collect Functional And Inverse Functional Properties</h4>
<p>Lists of functional properties can be retrieved from an ontology graph by query like:</p>
<div>
      <pre class="programlisting">
SPARQL define output:valmode &quot;LONG&quot;
SELECT (&lt;LONG::sql:VECTOR_AGG(?s)) 
FROM &lt;my-ontology-graph&gt;
WHERE 
  { 
    ?s a owl:functionalProperty 
  }
</pre>
    </div>
<p>Inverse functional properties could be retrieved by a similar query, but unfortunately the 
	ontology may mention so called NULL values that can be property values for many subjects. Current 
	implementation of diff and patch does not recognize NULL values so they can cause patch with 
	&quot;false alarm&quot; errors. The workaround is to retrieve only properties that have no NULL values declared:
</p>
<div>
      <pre class="programlisting">
SPARQL define output:valmode &quot;LONG&quot;
SELECT (&lt;LONG::sql:VECTOR_AGG(?s)) 
FROM &lt;my-ontology-graph&gt;
WHERE
  {
    ?s a owl:inverseFunctionalProperty .
    OPTIONAL { ?s owl:nullIFPValue ?v }
    FILTER (!Bound(?v)) 
  }
</pre>
    </div>
<p>If no ontology is available then appropriate predicates can be obtained from sample graphs using 
	<a href="fn_rdf_graph_collect_fp_list.html">DB.DBA.RDF_GRAPH_COLLECT_FP_LIST</a>.
</p>
<br />
<a name="fastapproxdiffandpatchimpl" />
    <h4>14.15.12.3. Implementation-Specific Extensions of GUO Ontology</h4>
<p>
      <strong>Note</strong>: This section contains implementation details that are needed only 
if you want to write your own patch or diff procedure, you don&#39;t have to worry about internals if 
you want to use existing procedures.
</p>
<p>Basic GUO ontology is not expressive enough to work with blank nodes, so some custom extensions $
are needed.
</p>
<p>In the rest of the description:</p>
<div>
      <pre class="programlisting">
@prefix guo: &lt;http://webr3.org/owl/guo#&gt;
</pre>
    </div>
<p>is assumed.</p>
<p>The diff contains one node of rdf:type guo:diff.
</p>
<p>For debugging purpose it has properties guo:graph1 and guo:graph2 that 
corespond to gfrom and gto arguments of <a href="fn_rdf_suo_diff_ttl.html">DB.DBA.RDF_SUO_DIFF_TTL</a>.
</p>
<p>The diff also contains zero or more nodes of rdf:type guo:UpdateInstruction. These 
nodes are as described in basic GUO ontology, but guo:target_graph is now optional, 
guo:target_subject can be a blank node and objects of predicates &quot;inside&quot; values of 
guo:insert and guo:delete can also be blank nodes. These blank nodes are 
&quot;placeholders&quot; for values, calculated according to the most important GUO extension - rule nodes.
</p>
<p>There are eight sorts of rule nodes, four for gfrom side of diff and four similar for 
gto side. Out of four sorts related to one side, two are for functional properties and 
two similar are for inverse functional properties. Thus rdf:type-s of these nodes are:
</p>
<div>
      <pre class="programlisting">
guo:from-rule-FP0, 
guo:from-rule-FP1, 
guo:from-rule-IFP0, 
guo:from-rule-IFP1
</pre>
    </div>
<p>and
</p>
<div>
      <pre class="programlisting">
guo:to-rule-FP0, 
guo:to-rule-FP1, 
guo:to-rule-IFP , 
guo:to-rule-IFP1 .
</pre>
    </div>
<p>Each rule node has property guo:order that is an non-negative integer.
</p>
<p>These integers enumerate all guo:from-rule-... nodes, starting from zero.
</p>
<p>When patch procedure works, these rules are used in this order, the result of each rule 
	is a blank node that either exists in the graph or just created.
</p>
<p>All results are remembered for use in the rest of the patch procedure.
</p>
<p>Similarly, other sequence of these integers enumerate all guo:to-rule-... nodes, 
also starting from zero.
</p>
<p>Consider a sequence of guo:from-rule-... nodes, because guo:to-rule- 
nodes have identical properties.
</p>
<p>A rule node can have zero or more values of guo:dep property, each value is a 
bnode that is rule node that should be calculated before the current one.
</p>
<p>Every rule has exactly one predicate guo:path that is a blank node. Each property 
of this blank node describes one possible &quot;move of slider&quot;: predicate to follow is in predicate 
position and a node to start from is in object position. An IRI or a literal in object position is 
used as is, a blank node in object position should be of type guo:from-rule-... and 
have smaller guo:order so it refers to already calculated result bnode of some 
preceding rule.
</p>
<p>Rule of form:
</p>
<div>
      <pre class="programlisting">
R a guo:from-rule-IFP1 ;
  guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
</pre>
    </div>
<p>searches for a unique blank node _:Rres that is a common subject of triples:
</p>
<div>
      <pre class="programlisting">
 _:Rres P1 O1
 _:Rres P2 O2
  . . .
 _:Rres Pn On
</pre>
    </div>
<p>in the gfrom graph.
</p>
<p>If subjects differ in these triples or some triples are not found or the subject is not a 
	blank node then an appropriate error is logged and rule fails, otherwise _:Rres 
	is remembered as the result of the rule.
</p>
<p>Similarly, rule of form:
</p>
<div>
      <pre class="programlisting">
R a guo:from-rule-FP1 ;
  guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
</pre>
    </div>
<p>searches for a unique blank node _:Rres that is a common object of triples:
</p>
<div>
      <pre class="programlisting">
 O1 P1 _:Rres
 O2 P2 _:Rres
  . . .
 On Pn _:Rres
</pre>
    </div>
<p>in the gfrom graph.
</p>
<p>Rule of form:
</p>
<div>
      <pre class="programlisting">
R a guo:from-rule-IFP0 ;
  guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
</pre>
    </div>
<p>ensures that the gfrom graph does not contain any triple like:
</p>
<div>
      <pre class="programlisting">
 _:Rres P1 O1
 _:Rres P2 O2
</pre>
    </div>
<p>or
</p>
<div>
      <pre class="programlisting">
_:Rres Pn On
</pre>
    </div>
<p>It is an error if something exists. If nothing found then the result of the rule is 
	newly created unique blank node. That&#39;s how patch procedure creates new blank nodes when 
	it inserts &quot;totally new&quot; data.
</p>
<p>Similarly, rule of form:
</p>
<div>
      <pre class="programlisting">
R a guo:from-rule-IFP0 ;
  guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
</pre>
    </div>
<p>ensures that the gfrom graph does not contain any triple like:
</p>
<div>
      <pre class="programlisting">
O1 P1 _:Rres
O2 P2 _:Rres
</pre>
    </div>
<p>or
</p>
<div>
      <pre class="programlisting">
 On Pn _:Rres
</pre>
    </div>
<p>Current version of patch procedure does not use rules guo:to-rule-...  , 
however they can be used by custom procedure of few sorts. First, these rules can be used to 
produce a &quot;reversed diff&quot;. Next, these rules can be used to validate the result of the patch - 
if the patch can not be reverted then the result is &quot;suspicious&quot;. 
</p>
<br />
<br />
  <a name="rdb2rdftriggers" />
    <h3>14.15.13. RDB2RDF Triggers</h3>
<p>RDF Views have many advantages, if compared to static dumps of the database in RDF triples. 
	However, they does not solve few problems. First, inference is supported only for physically stored 
	triples, so one had to chose between convenience of inference and convenience of RDF Views. Next, 
	algorithms that selects triples with non-constant graphs and predicates tend to produce enormous 
	texts of SQL queries if RDF Views are complicated enough. Finally, there may be a need in export 
	of big and fresh static RDF dump but preparing this dump would take too much time via both RDF 
	Views and traditional methods.
</p>
<p>The solution is set of triggers on source tables of an RDF View that edit parts of physical 
	dump on each change of source data. Unlike RDF Views that cost nothing while not queried, these 
	triggers add a significant overhead on any data manipulation on sources, continuously. To 
	compensate this, the dump should be in an intensive use and not replaceable by RDF Views. In 
	other cases, do not add these triggers.
</p>
<p>It is next to impossible to write such triggers by hands so a small API is provided to 
	generate SQL texts from metadata of RDF Views.
</p>
<p>First of all, views in an RDF storage does not work in full isolation from each other. 
	Some of them may partially disable others due to OPTION(EXCLUSIVE) and some may produce one 
	triple in different ways. As a result, triggers are not made on per-view basis. Instead, a 
	special RDF storage is introduced, namely virtrdf:SyncToQuads , all required triples are 
	added to it and triggers are created for the whole storage. Typically an RDF View is created 
	in some other storage, e.g., virtrdf:DefaultQuadStorage and then added to virtrdf:SyncToQuads via:
</p>
<div>
      <pre class="programlisting">
sparql alter quad storage virtrdf:SyncToQuads {
   create &lt;my_rdf_view&gt; using storage virtrdf:DefaultQuadStorage };
</pre>
    </div>
<p>The following example procedure copies all user-defined RDF Views from default quad storage 
	to virtrdf:SyncToQuads:
</p>
<div>
      <pre class="programlisting">
create procedure DB.DBA.RDB2RDF_COPY_ALL_RDF_VIEWS_TO_SYNC ()
{
  for (sparql define input:storage &quot;&quot;
    select (bif:aref(bif:sprintf_inverse (str(?idx), bif:concat (str(rdf:_), &quot;%d&quot;), 0), 0)) ?qm
    from virtrdf:
    where { virtrdf:DefaultQuadStorage-UserMaps ?idx ?qm . ?qm a virtrdf:QuadMap }
    order by asc (bif:sprintf_inverse (bif:concat (str(rdf:_), &quot;%d&quot;), str (?idx), 1)) ) do
    exec (sprintf (&#39;sparql alter quad storage virtrdf:SyncToQuads { create &lt;%s&gt; using storage virtrdf:DefaultQuadStorage }&#39;, &quot;qm&quot;));
}
;
</pre>
    </div>
<p>When the virtrdf:SyncToQuads storage is fully prepared, two API functions can be used:
</p>
<ul>
  <li>
        <a href="fn_sparql_rdb2rdf_list_tables.html">DB.DBA.SPARQL_RDB2RDF_LIST_TABLES</a>: 
  The function returns a vector of names of tables that are used as sources for RDF Views. Application 
  developer should decide what to do with each of them - create triggers or do some application-specific 
  workarounds.
  <p>Note that if some SQL views are used as sources for RDF Views and these views does not have 
  	INSTEAD triggers then workarounds become mandatory for them, not just a choice, because BEFORE 
  	or AFTER triggers on views are not allowed if there is no appropriate INSTEAD trigger. The mode 
  	argument should be zero in current version.
  </p>
  </li>
  <li>
        <a href="fn_sparql_rdb2rdf_codegen.html">DB.DBA.SPARQL_RDB2RDF_CODEGEN</a>: The 
  function creates an SQL text for a given table and an operation specified by an opcode.
  </li>
</ul>
<p>In some cases, RDF Views are complicated enough so that BEFORE UPDATE and AFTER DELETE 
	triggers are required in additional to the minimal set. In this case, sparql_rdb2rdf_codegen 
	calls will return a vector of two string sessions, not single string session, and both sessions 
	are sql texts to inspect or execute. In this case, the BEFORE trigger will not delete obsolete 
	quads from RDF_QUAD table, instead it will create records in a special table RDF_QUAD_DELETE_QUEUE 
	as guesses what can be deleted. The AFTER trigger will re-check these guesses, delete related quads 
	if needed and shorten the RDF_QUAD_DELETE_QUEUE.
</p>
<p>The extra activity of triggers on RDF_QUAD, RDF_OBJ, RDF_QUAD_DELETE_QUEUE and other tables 
	and indexes of the storage of &quot;physical&quot; triples may cause deadlocks so the application should 
	be carefully checked for proper support of deadlocks if they were very seldom before turning 
	RDB2RDF triggers on. In some cases, the whole processing of RDB2RDF can be moved to a separate 
	server and connected to the main workhorse server via replication.
</p>
<p>The following example functions create texts of all triggers, save them to files in for 
	further studying and try to load them. That&#39;s probably quite bad scenario for a production 
	database, because it&#39;s better to read procedures before loading them, especially if they&#39;re 
	triggers, especially if some of them may contain errors.
</p>
<div>
      <pre class="programlisting">
-- This creates one or two files with one or two triggers or other texts and try to load the 
generated sql texts.
create procedure DB.DBA.RDB2RDF_EXEC_CODEGEN1_FOR_TABLE 
(  in dump_prefix varchar, 
   in tbl varchar, 
   in dump_id any, 
   in txt any )
{
  declare fname varchar;
  declare stat, msg varchar;
  if (isinteger (dump_id))
    dump_id := cast (dump_id as varchar);
  if (__tag of vector = __tag (txt))
    {
      DB.DBA.RDB2RDF_EXEC_CODEGEN1_FOR_TABLE (dump_prefix, tbl, dump_id, txt[0]);
      DB.DBA.RDB2RDF_EXEC_CODEGEN1_FOR_TABLE (dump_prefix, tbl, dump_id || &#39;p&#39; , txt[1]);
      return;
    }
  if (__tag of varchar &lt;&gt; __tag (txt))
    txt := string_output_string (txt);
  fname := sprintf (&#39;%s_Rdb2Rdf.%s.%s.sql&#39;, dump_prefix, tbl, dump_id);
  string_to_file (fname, txt || &#39;\n;\n&#39;, -2);
  if (&#39;0&#39; = dump_id)
    return;
  stat := &#39;00000&#39;;
  msg := &#39;&#39;;
  exec (txt, stat, msg);
  if (&#39;00000&#39; &lt;&gt; stat)
    {
      string_to_file (fname, &#39;\n\n- - - - - 8&lt; - - - - -\n\nError &#39; || stat || &#39; &#39; || msg, -1);
      if (not (subseq (msg, 0, 5) in (&#39;SQ091&#39;)))
        signal (stat, msg);
    }
}
;

-- This creates and loads all triggers, init procedure and debug dump related to one table.
create procedure DB.DBA.RDB2RDF_PREPARE_TABLE (in dump_prefix varchar, in tbl varchar)
{
  declare ctr integer;
  for (ctr := 0; ctr &lt;= 4; ctr := ctr+1 )
    DB.DBA.RDB2RDF_EXEC_CODEGEN1_FOR_TABLE (dump_prefix, tbl, ctr, sparql_rdb2rdf_codegen (tbl, ctr));
}
;

-- This creates and loads all triggers, init procedure and debug dump related to all tables used by and RDF View.
create procedure DB.DBA.RDB2RDF_PREPARE_ALL_TABLES (in dump_prefix varchar)
{
  declare tbl_list any;
  tbl_list := sparql_rdb2rdf_list_tables (0);
  foreach (varchar tbl in tbl_list) do
    {
      DB.DBA.RDB2RDF_PREPARE_TABLE (dump_prefix, tbl);
    }
}
;
</pre>
    </div>
<p>The following combination of calls prepares all triggers for all RDF Views of the default storage:
</p>
<div>
      <pre class="programlisting">
DB.DBA.RDB2RDF_COPY_ALL_RDF_VIEWS_TO_SYNC ();
DB.DBA.RDB2RDF_PREPARE_ALL_TABLES (cast (now() as varchar));
</pre>
    </div>
<p>This does not copy the initial state of RDB2RDF graphs to the physical storage, because this can 
	be dangerous for existing RDF data and even if all procedures will work as expected then they may 
	produce huge amounts of RDF data, run out of transaction log limits and thus require 
	application-specific precautions. It is also possible to make initial loading by a SPARUL statements 
	like:
</p>
<div>
      <pre class="programlisting">
SPARQL 
INSERT IN &lt;snapshot-graph&gt; { ?s ?p ?o } 
FROM &lt;snapshot-htaph&gt; 
WHERE 
 { quad map &lt;id-of-rdf-view&gt; 
 	  { ?s ?p ?o } 
 };
</pre>
    </div>
<br />  
<table border="0" width="90%" id="navbarbottom">
    <tr>
        <td align="left" width="33%">
          <a href="rdfsparqlgeospat.html" title="RDF and Geometry">Previous</a>
          <br />RDF and Geometry</td>
     <td align="center" width="34%">
          <a href="rdfandsparql.html">Chapter Contents</a>
     </td>
        <td align="right" width="33%">
          <a href="rdfnativestorageproviders.html" title="RDF Data Access Providers (Drivers)">Next</a>
          <br />RDF Data Access Providers (Drivers)</td>
    </tr>
    </table>
  </div>
  <div id="footer">
    <div>Copyright© 1999 - 2009 OpenLink Software All rights reserved.</div>
   <div id="validation">
    <a href="http://validator.w3.org/check/referer">
        <img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" />
    </a>
    <a href="http://jigsaw.w3.org/css-validator/">
        <img src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!" height="31" width="88" />
    </a>
   </div>
  </div>
 </body>
</html>