Monday, December 20, 2010

Configuring the Importer

In order to support different setups and more than one data type, the Importer has a fairly rich set of configuration parameters. I will use a complex setup as an example to describe them and illustrate the Importer's full range of capability.

An Example EDI Setup
FTP servers are still used as a way of providing 'drop boxes' for external partners to electronically supply data (known as Electronic Data Interchange, or EDI for short). While there are many data formats used for EDI, clear delimited text is still common because of its simplicity.

Our example will obtain tab-delimited text files provided by external partners through FTP and import them to two places:
  1. Directly to a primary relational database table named Things through an Importer.
  2. Indirectly to a secondary relational database table named OtherThings by
    1. Submitting the data to a message queue
    2. Which a second Importer will monitor and import received data directly into OtherThings.



The First Importer's Configuration
In order to monitor the FTP 'drop box' directories for delimited text files, the first Importer is configured to use DirectoryWatchTableImportService (1) to watch the FTP directories and SingleTableTextSourceImporter (2) to parse the delimited text files as follows:



Here DirectoryWatchTableImportService is configured to
  1. Watch a base directory and all of its subdirectories (3),
  2. Associate subdirectories with sources of data (4),
  3. Associate file names with table import configurations (5).
The configuration above supports a FTP server configured so a partner named SOURCE1 can drop files named either things.ext or stuff.ext in a directory named c:\baseWatchDirectory\source1Dir\. Files named things.ext are associated with a table import configuration (5) that describes how to parse it as a delimited text file and then associate it with a table definition (6) that describes how to update the destination table. Files named stuff.ext are similarly associated with their own configuration (7).

Table Definitions
The following diagram expands out the full table definition identified in (6) above. A table definition identifies the Updater to use (line #4) to update the destination, the destination database to update (line #5), the destination table to update (line #7), the mapping of columns from source to destination (line #10), and gives it an id (line #1). Note that it also includes the following:
  1. A destination foreign key column name used to store records' data type, identified in line #8 as TypeOfThingsId column name below. In this example, the updater will put the table definition id, line #1 'ImportThingsToThings', into this column for each updated destination record.
  2. A destination foreign key column name used to store the source of data, identified in line #9 as GroupOfThingsId column name below. The updater will put the source of the data, which for the example configuration is based on the directory, into this column for each updated destination record. For example, if things.ext is put into directory c:\baseWatchDirectory\source1Dir\, the updater will put SOURCE1' into this column for each updated destination record.
Note that Namespace.SingleTableTextSourceImporter (2), as used in table import configuration (5) in the prior diagram, maps the column position to a destination column name. For example, line #10b maps the second column of the source file to the destination column name DestColumnName1, effectively ignoring the source column name specified as SourceColumnName1.


The Second Importer's Configuration
Assuming ThingsUpdater.onRunCompleted() is implemented to inform a Queue path machinename\private$\thingsqueue of updates to Things using Importer's QueueInformer, a second importer can be configured as follows to update the OtherThingsRepository:
  1. Use QueueImportService (1) to 
  2. Monitor the queue path (2) and 
  3. Map data in messages to new table definitions (4) based on their original table definition id. 

The table definition mappings above (4) map source definitions to destination definitions. In our example, the first importer associated data in files named things.ext to a table definition with an id of ImportThingsToThings. Based on (4) in the preceding diagram,  QueueImportService will map message data associated with this id to a new table definition named ImportThingsToOtherThingsQueueImportService can map more than one source table definition with destination and can therefore support receiving a variety of import data messages through one queue path.

Boiling Down the Details
The previous example described a fairly complex Importer setup. While rich in details, its important to remember that Importer configuration basically identifies a Service, an Importer, and a column mapping along with an Updater for each data type.

Sunday, December 19, 2010

How the Importer works

The Importer is a simple mechanism that performs four basic steps to import from a source to destination within four corresponding components:
  1. A Service which obtains a source.
  2. An Importer that gets the data from the source.
  3. An Updater that updates one or more destinations with the data.
  4. A Handler that takes whatever action is required after the import has completed. Typical actions include handling individual update errors and informing the source provider of the status of the import.




Because there are a variety of ways to perform each of the four steps, the Importer is designed so each of the four corresponding components can be extended or replaced. The points where components can be extended are colored red in the following:

 

It isn't necessary to extend any of the components other than TableDataUpdater, however, because the Importer comes with the following stock implementations to cover common uses:
  1. Both DirectoryWatch and Queue Services for monitoring directories and queues respectively for source data.
  2. An Importer that extracts data from delimited text sources.
  3. A Handler that saves records that were rejected by Updaters and provides a user interface for editing and re-submitting them.
The only component that needs to be extended is TableDataUpdater, and it only requires one override, getMatchingEntityWhereClause(), to help it identify unique destination records. ValidateValues() can also be overridden if necessary to customize the logic used to validate records before updating the destination. Otherwise, TableDataUpdater already implements basic validation and all the CRUD functionality required to update virtually any relational database destination 'out of the box'.



Extensible yet simple, the Importer provides both a foundation for new functionality as well as basic functionality required to import from text files to destination tables almost entirely through configuration.