Sophie: perl-XML-DBMS-1.03-20.mga4 noarch

perl-XML-DBMS-1.03-20.mga4.noarch.rpm

XML-DBMS, Perl Version 1.01

***** Introduction ******

To move data between XML documents and the database, you first need to create (write or generate) a mapping document, which describes how elements and attributes in the XML document are mapped to tables and columns in the database. You then need to write a program (or use the sample program) that calls the XMLToDBMS module.

For a complete introduction to XML-DBMS, see http://www.rpbourret.com/xmldbms/index.htm.

***** Installation *****

To install the perl version of XML-DBMS, use the following commands:

perl Makefile.PL
make 
make install

***** Use *****

The following sample shows how to store and retrieve the data in an XML document.

use DBI;
use XML::XMLtoDBMS;

my $dbh = DBI->connect("dbi:Informix:dbname", "username", "password");
my $xmlTodbms = new XML::XMLtoDBMS($dbh);

$xmlTodbms->setMap(Sales.map');
my $doc = $xmlTodbms->storeDocument( Source => {File => "Sales.xml"} );
$doc->dispose;			# do not forget to release DOM memory

#select specific document
$doc = $xmlTodbms->retrieveDocument('Sales', [['123']]); 

$xml = $doc->toString;
print $xml . "\n";
$doc->dispose;			# do not forget to release DOM memory

$xmlTodbms->destroy;		# undefs all DBI prepared statement handles 
$dbh->disconnect;

***** Differences between the Perl and Java implementations *****

While I have tried my best to keep code similar there are differences due to the reasons:

- SAX parser level (Java uses level 2 SAX parser)
- Database api differences(DBI vs. JDBC)
- Some things that did not work for my practical needs.

These have resulted in the following:

1) Namespaces declared in the mapping document are ignored.

2) Uniform treatment of the data types.

No SQL-type cases are handled. This is not a problem in most cases because of the DBI and perl loose data typing - you can bind perl variable to any type of column hoping that conversion will succeed somewhere farther along the way.

Date/time/timestamp format conversions are done to match the functionality found in Java package. Currently, I use perl's RE capabilities to recognize dates/timestamps on the fly and convert them according to/from the format patterns specified in a map file. The Java implementation uses JDK datetime formatting based on locale and style. There is no perl counterpart and I found it a bit limiting. Instead I used token format encoding - at least for now.

The following tokens can are used for datetime formatting: YY, YYYY, MM, DD, hh, mm, ss. In absence of format specifiers the output format is ISO-8601 "YYYY-MM-DD hh:mm:ss". I used TimeDate.pm from Graham Bar to do the work. The same format patterns are used to recognize datetimes when parsing an XML document for inserts to the database.  An attempt is made to convert the (every!) string into datetime and if it succeeds it is farther converted to US-en locale date format (MM/DD/YY). The time part is ignored because inserting a datetime string into date field gives an error with some databases. Most databases do not understand ISO-8601 format so I hard coded to my database locale. This is cumbersome and was done for practical purposes only.

Ideally, one needs to know the database locale and column type to properly format data and evade string to date/datetime conversion errors. I intend to rewrite this in a future to use table metadata but so far I was not able to see how this can be done with DBI without DB-specific code.

3) Key Generator.

Key generators create "next" unique keys for inserts. The Java package implementation uses the XMLDBMSKey table to do what in Oracle is achieved with sequence. Since I needed a key that was max(ID) + 1 from the table I was inserting to, my key generator does exactly this. It also converts the number to a string with a number of pre-appended spaces in case the data type of the ID column is character with right-adjusted values - a not uncommon practice.

In the future, I feel that key generators can be extended to cover at least the following cases:

1) Keys are MS SQL Server-like GUIDs;
2) As in the Java implementation: keys for all tables are from one sequence stored in some table (Oracle-like sequence);
3) Keys come from a sequence table that stores max/next key separately for each table.
4) key = max(key) + N from the table the inserts are done to - in effect, method 3, but without the auxiliary table. (This is the current implementation.)

Note that to allow users to choose between these generators, a change in the mapping file format is required to indicate generator to use. 

One final note about key generators: In general, you cannot use Autoincrement fields as a foreign key for inserts of related child records. This would require an extra read in the same transaction between parent and child inserts. However, if you know the autoincrement step, it is possible to bypass this by using a key generator of type 3 or 4 (see above) working in sync with the autoincrement fields.

4) Transactions.

The package does not perform transactions. In other words, commit-on-insert is default unless you enclose storeDocument in BEGIN, COMMIT, ROLLBACK commands.


5) Changes in 1.01. 

The changes to this version are primarily bug fixes from 1.0 and extended capabilities in retrieveDocument().

The retrieveDocument() function can now be used with an additional filter for the select statement that determines topmost node set. The original Perl version allowed selection by the primary key only, while the Java version has a retrieveDocument function that takes a ResultSet as a parameter. This addition brings an analogous capability to the Perl version.

Three other parameters have been added to limit the range of the parent records. This is useful for scrolling through a document.

The new signature of retrieveDocument() is:

   retrieveDocument (parentTable, keysArray, selectFilter,
                     firstRecord, numberOfRecords,  refTotalRecords)

a)	passing undef in keysArray will return all the records in the parent table. Otherwise pass an array of primary keys arrays. For example [ [1,0..], [1,1..], [1,2..], ..]

b)	selectFilter should be specified in the form "colname1=value and colname2=value and ..." and will be applied to the selection of the parent recordset -- that is, the topmost element in the document.

c)	firstRecord  - ordinal number of the first record in the parent recordset to include in the document

d)	refTotalRecords  - number of records to include in the document.

e)	refTotalRecords is the reference to the variable that if passed will hold the total number of the records in the parent recordset (not the number of actually returned).  Having this parameter augments much the scrolling capability.  The first call to retrieveDocument() will tell you the size of the parent recordset so youll know when you will reach the end.

(Without the total number of records, the only way to tell if you are at the end is to check the actual number of returned nodes and compare this with the range asked. This is not to good a solution considering youll need an extra call to retrieveDocument() that returns an empty document if the range coincides with the end of the recordset in the preceding call.)