Monday, December 20, 2010

Configuring the Importer

In order to support different setups and more than one data type, the Importer has a fairly rich set of configuration parameters. I will use a complex setup as an example to describe them and illustrate the Importer's full range of capability.

An Example EDI Setup
FTP servers are still used as a way of providing 'drop boxes' for external partners to electronically supply data (known as Electronic Data Interchange, or EDI for short). While there are many data formats used for EDI, clear delimited text is still common because of its simplicity.

Our example will obtain tab-delimited text files provided by external partners through FTP and import them to two places:
  1. Directly to a primary relational database table named Things through an Importer.
  2. Indirectly to a secondary relational database table named OtherThings by
    1. Submitting the data to a message queue
    2. Which a second Importer will monitor and import received data directly into OtherThings.



The First Importer's Configuration
In order to monitor the FTP 'drop box' directories for delimited text files, the first Importer is configured to use DirectoryWatchTableImportService (1) to watch the FTP directories and SingleTableTextSourceImporter (2) to parse the delimited text files as follows:



Here DirectoryWatchTableImportService is configured to
  1. Watch a base directory and all of its subdirectories (3),
  2. Associate subdirectories with sources of data (4),
  3. Associate file names with table import configurations (5).
The configuration above supports a FTP server configured so a partner named SOURCE1 can drop files named either things.ext or stuff.ext in a directory named c:\baseWatchDirectory\source1Dir\. Files named things.ext are associated with a table import configuration (5) that describes how to parse it as a delimited text file and then associate it with a table definition (6) that describes how to update the destination table. Files named stuff.ext are similarly associated with their own configuration (7).

Table Definitions
The following diagram expands out the full table definition identified in (6) above. A table definition identifies the Updater to use (line #4) to update the destination, the destination database to update (line #5), the destination table to update (line #7), the mapping of columns from source to destination (line #10), and gives it an id (line #1). Note that it also includes the following:
  1. A destination foreign key column name used to store records' data type, identified in line #8 as TypeOfThingsId column name below. In this example, the updater will put the table definition id, line #1 'ImportThingsToThings', into this column for each updated destination record.
  2. A destination foreign key column name used to store the source of data, identified in line #9 as GroupOfThingsId column name below. The updater will put the source of the data, which for the example configuration is based on the directory, into this column for each updated destination record. For example, if things.ext is put into directory c:\baseWatchDirectory\source1Dir\, the updater will put SOURCE1' into this column for each updated destination record.
Note that Namespace.SingleTableTextSourceImporter (2), as used in table import configuration (5) in the prior diagram, maps the column position to a destination column name. For example, line #10b maps the second column of the source file to the destination column name DestColumnName1, effectively ignoring the source column name specified as SourceColumnName1.


The Second Importer's Configuration
Assuming ThingsUpdater.onRunCompleted() is implemented to inform a Queue path machinename\private$\thingsqueue of updates to Things using Importer's QueueInformer, a second importer can be configured as follows to update the OtherThingsRepository:
  1. Use QueueImportService (1) to 
  2. Monitor the queue path (2) and 
  3. Map data in messages to new table definitions (4) based on their original table definition id. 

The table definition mappings above (4) map source definitions to destination definitions. In our example, the first importer associated data in files named things.ext to a table definition with an id of ImportThingsToThings. Based on (4) in the preceding diagram,  QueueImportService will map message data associated with this id to a new table definition named ImportThingsToOtherThingsQueueImportService can map more than one source table definition with destination and can therefore support receiving a variety of import data messages through one queue path.

Boiling Down the Details
The previous example described a fairly complex Importer setup. While rich in details, its important to remember that Importer configuration basically identifies a Service, an Importer, and a column mapping along with an Updater for each data type.

1 comment:

  1. Powerful. This should also be able to facilitate other EDI transfers with partners without the complexity of commercial integration platforms. Nice!

    ReplyDelete