Thursday, January 27, 2011

The Role of Workers In Work Flows

Workers are primarily used within VeroServe Workflows as in-memory, serializable data structures that flow through activities and can be persisted by the work flow. Data types used in a Worker should be decorated with [DataMember] attribute.

Implementations of the doWork() method are optional when execution is intended to be distributed to an ExecutionServe service. 

Monday, December 20, 2010

Configuring the Importer

In order to support different setups and more than one data type, the Importer has a fairly rich set of configuration parameters. I will use a complex setup as an example to describe them and illustrate the Importer's full range of capability.

An Example EDI Setup
FTP servers are still used as a way of providing 'drop boxes' for external partners to electronically supply data (known as Electronic Data Interchange, or EDI for short). While there are many data formats used for EDI, clear delimited text is still common because of its simplicity.

Our example will obtain tab-delimited text files provided by external partners through FTP and import them to two places:
  1. Directly to a primary relational database table named Things through an Importer.
  2. Indirectly to a secondary relational database table named OtherThings by
    1. Submitting the data to a message queue
    2. Which a second Importer will monitor and import received data directly into OtherThings.



The First Importer's Configuration
In order to monitor the FTP 'drop box' directories for delimited text files, the first Importer is configured to use DirectoryWatchTableImportService (1) to watch the FTP directories and SingleTableTextSourceImporter (2) to parse the delimited text files as follows:



Here DirectoryWatchTableImportService is configured to
  1. Watch a base directory and all of its subdirectories (3),
  2. Associate subdirectories with sources of data (4),
  3. Associate file names with table import configurations (5).
The configuration above supports a FTP server configured so a partner named SOURCE1 can drop files named either things.ext or stuff.ext in a directory named c:\baseWatchDirectory\source1Dir\. Files named things.ext are associated with a table import configuration (5) that describes how to parse it as a delimited text file and then associate it with a table definition (6) that describes how to update the destination table. Files named stuff.ext are similarly associated with their own configuration (7).

Table Definitions
The following diagram expands out the full table definition identified in (6) above. A table definition identifies the Updater to use (line #4) to update the destination, the destination database to update (line #5), the destination table to update (line #7), the mapping of columns from source to destination (line #10), and gives it an id (line #1). Note that it also includes the following:
  1. A destination foreign key column name used to store records' data type, identified in line #8 as TypeOfThingsId column name below. In this example, the updater will put the table definition id, line #1 'ImportThingsToThings', into this column for each updated destination record.
  2. A destination foreign key column name used to store the source of data, identified in line #9 as GroupOfThingsId column name below. The updater will put the source of the data, which for the example configuration is based on the directory, into this column for each updated destination record. For example, if things.ext is put into directory c:\baseWatchDirectory\source1Dir\, the updater will put SOURCE1' into this column for each updated destination record.
Note that Namespace.SingleTableTextSourceImporter (2), as used in table import configuration (5) in the prior diagram, maps the column position to a destination column name. For example, line #10b maps the second column of the source file to the destination column name DestColumnName1, effectively ignoring the source column name specified as SourceColumnName1.


The Second Importer's Configuration
Assuming ThingsUpdater.onRunCompleted() is implemented to inform a Queue path machinename\private$\thingsqueue of updates to Things using Importer's QueueInformer, a second importer can be configured as follows to update the OtherThingsRepository:
  1. Use QueueImportService (1) to 
  2. Monitor the queue path (2) and 
  3. Map data in messages to new table definitions (4) based on their original table definition id. 

The table definition mappings above (4) map source definitions to destination definitions. In our example, the first importer associated data in files named things.ext to a table definition with an id of ImportThingsToThings. Based on (4) in the preceding diagram,  QueueImportService will map message data associated with this id to a new table definition named ImportThingsToOtherThingsQueueImportService can map more than one source table definition with destination and can therefore support receiving a variety of import data messages through one queue path.

Boiling Down the Details
The previous example described a fairly complex Importer setup. While rich in details, its important to remember that Importer configuration basically identifies a Service, an Importer, and a column mapping along with an Updater for each data type.

Sunday, December 19, 2010

How the Importer works

The Importer is a simple mechanism that performs four basic steps to import from a source to destination within four corresponding components:
  1. A Service which obtains a source.
  2. An Importer that gets the data from the source.
  3. An Updater that updates one or more destinations with the data.
  4. A Handler that takes whatever action is required after the import has completed. Typical actions include handling individual update errors and informing the source provider of the status of the import.




Because there are a variety of ways to perform each of the four steps, the Importer is designed so each of the four corresponding components can be extended or replaced. The points where components can be extended are colored red in the following:

 

It isn't necessary to extend any of the components other than TableDataUpdater, however, because the Importer comes with the following stock implementations to cover common uses:
  1. Both DirectoryWatch and Queue Services for monitoring directories and queues respectively for source data.
  2. An Importer that extracts data from delimited text sources.
  3. A Handler that saves records that were rejected by Updaters and provides a user interface for editing and re-submitting them.
The only component that needs to be extended is TableDataUpdater, and it only requires one override, getMatchingEntityWhereClause(), to help it identify unique destination records. ValidateValues() can also be overridden if necessary to customize the logic used to validate records before updating the destination. Otherwise, TableDataUpdater already implements basic validation and all the CRUD functionality required to update virtually any relational database destination 'out of the box'.



Extensible yet simple, the Importer provides both a foundation for new functionality as well as basic functionality required to import from text files to destination tables almost entirely through configuration.

Thursday, August 19, 2010

Introducing "VeroMark", Object-RDF Mapping (ORM)

"VeroMark" semantic client user interface builder adds to WPF's already rich, XAML-based presentation system powerful UI control extensions as well as the ability to manage data retrieval, data bind UI elements to it, and navigate between pages entirely through configuration, thus enabling the construction of complete WPF user interfaces without touching compiled code. Instead of producing mock-ups and explaining what they imagine the business needs, picture your data and business analysts marking up their ideas directly as functioning UI screens, allowing your software developers to focus on more technical aspects.

As a part of eliminating the need for compiled code, VeroMark supports data binding to an RDF triples-based client model through an object-RDF mapping (ORM) mechanism, which decouples presentation from an underlying "VeroServe Data" data model such that the application's usability isn't limited by it. This article introduces "VeroMark" object-RDF mapping and illustrates key concepts using a simple example.

Building Blocks
WPF's built-in data binding technology is at the heart of how VeroMark connects the user interface elements to the client data and keep them in sync with one another. The client data model available in a page is "described" in XAML markup (this is the ORM) and made available for data binding via one or more EntityProviders. A user interface element is then data-bound to one of these EntityProviders (via an XAML Binding expression "Source"), which at run-time provides in-memory EntityValue object instance(s) that provide access to the actual data to be interacted with on the screen.


Figure 1 Logical relationships between data binding components

Conceptually, think of a description as a Type definition that includes information about how to access property values from within the client data, and an EntityValue as an instance of that Type that provides access to the described properties from a specific, run-time data context (e.g. some specific Id chosen on a search results page).

There is also an EntityValue derivative called Entity, which adds the ability to contain other EntityValue instances (think Composite pattern) and makes them accessible via its property indexer and their description Name. The containment hierarchy of Entity/EntityValue(s) returned by an EntityProvider is determined by the containment hierarchy of its associated description.

Binding Example
Here we show a small example based on the example Ontology. First, we create the description that defines our "BlogEntity". Each property description below defines the property name, the value type (Literal or Identifier), and a Sparql graph pattern for obtaining the unit of client data using a specific data context:

<Controller:EntityDescription x:Key="BlogEntityDescription">
<Controller:EntityDescription.PropertyDescriptions>
<Controller:EntityValueDescription Name="Type" ValueType="Utilities:Identifier">
<Controller:EntityValueDescription.DataAccessor>
<Controller:Template DefaultPattern="?BlogEntity rdf:type ?Type"/>
</Controller:EntityValueDescription.DataAccessor>
</Controller:EntityValueDescription>
<Controller:EntityValueDescription Name="ReferenceName" ValueType="Utilities:Literal">
<Controller:EntityValueDescription.DataAccessor>
<Controller:Template DefaultPattern="?BlogEntity upo:hasReferenceName ?ReferenceName"/>
</Controller:EntityValueDescription.DataAccessor>
</Controller:EntityValueDescription>
<Controller:EntityValueDescription Name="Location" ValueType="Utilities:Identifier">
<Controller:EntityValueDescription.DataAccessor>
<Controller:Template DefaultPattern="?BlogEntity upo:hasLocation ?Location"/>
</Controller:EntityValueDescription.DataAccessor>
</Controller:EntityValueDescription>
</Controller:EntityDescription.PropertyDescriptions>
</Controller:EntityDescription>

Next, we create an EntityProvider instance that references the description (via the "DescriptionSource" property):

<Dsp:EntityProvider x:Key="BlogEntityProvider" PageModelManager="{StaticResource ModelManager}" RootVariable="BlogEntity">
<Controller:DescriptionGenerator DescriptionSource="{StaticResource BlogEntityDescription}" IsSharedDescription="True">
<Controller:Setter PropertyName="Name" Value="BlogEntity"/>
</Controller:DescriptionGenerator>
</Dsp:EntityProvider>

And finally, we data-bind the Text property of a UI TextBox to the "ReferenceName" property we defined as part of the BlogEntity description:

Text="{Binding Source={StaticResource BlogEntityProvider}, Path=[ReferenceName].StringValue, Mode=OneWay}"

A WPF Binding Path starts from a particular object instance (that provided by the Binding Source) and follows the path of property relationships to the resulting property value. Properties preceded by the "." syntax refer to compile time properties, while those encased in square brackets ("[]") refer to property indexer with a signature Type-compatible with the value between the brackets.

We will cover EntityValue properties in more detail in a future article, but for now note that "StringValue" is a compile time property that gives us to access to the string form of the "ReferenceName" Literal in the client data.


Figure 2 Working Example showing "BlogEntity" properties bound to UI elements

In future articles we will elaborate on various aspects of VeroMark data binding including:
  • View Model properties available for binding (via the XAML Binding expression "Path")
  • Data context containment-inheritance
  • Templates
  • Binding converters
  • Other data source "providers" (e.g. FindProvider)


Monday, August 2, 2010

Tabular Data from RDF

RDF data usually needs to be transformed into a tabular form for export to traditional relational database systems. While on the surface they may look quite different, RDF data and tabular data are just different ways of organizing the same thing.  Understanding a little about RDF, how it compares to tables, and how to query it is all that's needed to begin experimenting with generating tables from RDF data, which you can do right here from this posting.

RDF
An RDF data set is a collection of statements of interest about something. Each statement consists of three things, a subject, a predicate, and an object, and are collectively known as a triple. Triples are just like simple natural language statements. In the natural language statement "Jane Smith's haircolor's blonde.", "Jane Smith" is the subject, "haircolor" is the predicate, and "blonde" is the object. Together they make an assertion of fact about the world.

RDF Compared to Tables
In a table, like those within relational databases, the subject, predicate, and object are modeled as a row, column, and intersecting cell, respectively. For example, if we also made the statement that "John Doe's haircolor's brown" we could represent these statements in a simple table as:

person       haircolor   
---------    ---------
Jane Smith   blonde   
John Doe     brown  

Queries
A SPARQL SELECT query can be used to transform RDF data into a table. For example, a 'Persons' table can be obtained from some example RDF data by executing the following SPARQL SELECT query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix upo:  <http://mydomain.org/myupperontology/myversion/>
prefix o:  <http://mydomain.org/myontology/myversion/>

SELECT ?name ?haircolor ?country ?region
FROM <http://www.compass-point.net/data/ontology.ttl>
WHERE
{
    ?id upo:hasReferenceName ?name . 
    ?id rdf:type o:Person . 
    ?id upo:hasLocation ?loc . 
    ?loc rdf:type upo:Country . 
    ?loc upo:hasReferenceName ?country . 
    ?id o:hasHairColor ?hc . 
    ?hc upo:hasReferenceName ?haircolor . 
    ?loc o:countryIn ?reg . 
    ?reg upo:hasReferenceName ?region .
}

When executed against the RDF data, this query results in the following 'Persons' table:

--------------------------------------------------------------
| name         | haircolor | country | region                |
==============================================================
| "Jane Smith" | "Blonde"  | "China" | "Asia Pacific Region" |
| "John Doe"   | "Brown"   | "UK"    | "UK / Europe Region"  |
--------------------------------------------------------------

In the previous query, note the following:
  • The variables declaration, the line  "SELECT ?name ?haircolor ?country ?region" defines the columns of the returned table as 'name', 'haircolor', 'country', and 'region'. 
  • The portion after the word WHERE specifies a set of triple patterns that a SPARQL processer must find within the RDF data for there to be a variable match. For example, when taken by themselves the first two lines are two triple patterns that a SPARQL processor would use to find all subjects for which both the following statements have been made
  1. A statement asserting it hasReferenceName
  2. A statement asserting it is of type Person.
  In effect, a SPARQL SELECT query transforms RDF data into a table.


Kicking the Tires
You can try this out for yourself. Select and copy the following SPARQL SELECT query, paste it into this SPARQL Processor, choose "text output", and then click the "get results" button.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix upo:  <http://mydomain.org/myupperontology/myversion/>
prefix o:  <http://mydomain.org/myontology/myversion/>

SELECT ?name ?type ?country
FROM <http://www.compass-point.net/data/ontology.ttl>
WHERE
{
    ?id upo:hasReferenceName ?name . 
    ?id rdf:type ?type . 
    ?type rdfs:subClassOf o:Party . 
    ?id upo:hasLocation ?loc . 
    ?loc o:countryIn o:UKR .     
    ?loc upo:hasReferenceName ?country
}

This query results in a "Parties" table populated with only those parties in the UK / Europe Region.


-----------------------------------------------------------------------------------
| name          | type                                                 | country  |
===================================================================================
| "MusPub Inc." | <http://mydomain.org/myontology/myversion/Publisher> | "France" |
| "John Doe"    | <http://mydomain.org/myontology/myversion/Person>    | "UK"     |
-----------------------------------------------------------------------------------

To create other queries, refer to the example RDF data when crafting your query and then modify both the parts after the word SELECT and within the brackets after the word WHERE. Note that the triple patterns within the WHERE brackets must be separated by a space-period-space and the FROM line must stay the same to access the example RDF data.



What's next
Subsequent articles will branch from here down both practical and conceptual paths, covering
  • How to export from "VeroServe Data" to both Reporting and Integration Services, 
  • How to make data bindings to RDF data in a "VeroMark" semantic client user interface,
  • How "VeroServe Data" supports encapsulation of data, data structure, and ultimately business logic in one place through the use of RDF.

Tuesday, July 27, 2010

"Vero" what??

VeroRight” is the code name identifying a suite of technologies that enable rapid construction of complete semantic data solutions. Unlike common solution development platforms that result in data being stuffed into rigid, quickly-aging models accessed by expensive, custom-programmed applications – causing business knowledge to be broken apart and buried within data, code, systems, tools, and people – “VeroRight” is designed from the ground up for:
  • Changing data models and surrounding applications without corruption or excessive cost.
  • Gathering knowledge in a single place instead of spreading it across data, code, and people.
  • Deriving value from data in new, unanticipated ways today and in the future.
  • Containing costs by reducing dependency on expensive software engineering.
With "VeroRight", a complete semantic data solution is mostly configured, not programmed, then deployed, enterprise-ready:



The “VeroRight” suite of technologies includes:
  • VeroMark”, the semantic client user interface builder
  • VeroServe”, a set of semantic data services
  • The xServe set of enterprise infrastructure services:
  • LogServe” enterprise diagnostics service
  • ProfileServe” enterprise user profile service
  • PermissionServe” enterprise authorization service
  • IdentityServe” enterprise identity management and authentication service
  • ExecutionServe” distributed work execution service
Because data is modeled semantically, VeroRight solutions can deliver accuracy despite change -- which explains at least the first part of the name: Vero is Italian for truth.


Monday, July 19, 2010

Executing work from a "VeroServe Workflow"

While the interaction of an enterprise user application with its back-end services is orchestrated in a “VeroServe Workflow”, execution of intensive or performance sensitive “VeroServe Workflow” work can in turn be delegated to one or more instances of ExecuteServe for load balancing across servers. “VeroServe Workflow” hides the details of delegating work to ExecuteServe through its ExecuteActivity.




To use ExecuteActivity, first add it to VS by hovering over the toolbar when displaying a work flow, right clicking, selecting the “System.Activities.Components” tab, then browsing for ExecutionActivity.dll (built from svn://[repositoryAddress]/[company]/WorkflowServices/WorkflowServices.sln). Then, drag it to the appropriate location in your work flow. 

ExecuteActivity requires configuration of the following in and out parameters:

ExecuteActivity.DLL Activity Control:

In Parameters:
   callbackAddress          String
   workflowInstanceId       String
   executeActivityNumber    String
   worker                   Worker
   executionServiceAddress  String


Out Parameters:
   executeReturnParameter   ExecuteReturnParameter


    public class ExecuteReturnParameter
    {
        public string Status { ... }
        public string Message { ... }
        public string AssemblyName { ... }
        public string ClassName { get; set; }
        public string SerializedWorker { ... }
        public ExecuteParameter ExecutionParameter { ... }
    }


Work flows containing the ExecuteActivity must also have a client endpoint element in its web.config named "IExecutionServiceEndpointConfigurationName". The following web.config entry can be directly cut/pasted into a "VeroServe Workflow" web.config without any modification to satisfy this requirement:

Web.config Settings:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.serviceModel>
    <bindings>
      <basicHttpBinding>
        <binding name="BasicHttpBinding_IExecutionService" .... >
        </binding>
      </basicHttpBinding>
    </bindings>
    <client>
      <endpoint binding="basicHttpBinding" 
        bindingConfiguration="BasicHttpBinding_IExecutionService" contract="IExecutionService" address="http://localhost/ExecutionService.svc"
        name="IExecutionServiceEndpointConfigurationName" />
    </client>
  </system.serviceModel>
</configuration>

The following figure is an example of settings for ExecuteActivity parameters:



ExecuteActivity's in parameters include:
  • callbackAddress, the execution URL of the “VeroServe Workflow” containing the ExecuteActivity. In this figure, this "VeroServe Workflow" is configured to execute at "http://localhost:1392/PayoutWorkflowService.xamlx".
  • workflowInstanceId, a unique number that identifies the instance of the "VeroServe Workflow" executing. In this example, the variable workflowId was set in "VeroServe Workflow" prior to this ExecuteActivity with a unique Guid generated for the executing instance.
  • executeActivityNumber, a unique constant that identifies the ExecuteActivity within the "VeroServe Workflow". In this example, the variable invocationNumber was set to a constant that uniquely identifies this ExecuteActivity within the "VeroServe Workflow".
  • worker, an instance of a derivative of Worker, with its DataMember-decorated properties set to their values in preparation for execution within ExecutionServe.
  • executionServiceAddress, the URL of an ExecutionServe service.

ExecuteActivity also requres an ExecuteReturnParameter out parameter defined in Entities.dll (built from svn://[repositoryAddress]/[company]/WorkflowServices/WorkflowServices.sln), which will include a Worker-defined status and an ExecutionParameter that can be used by Worker's desearalize() static method to re-instantiate the Worker within the work flow after ExecuteActivity completes. Modifications made to any Worker property decorated with the [DataMember] attribute during execution directed by ExecuteActivity will be subsequently reflected in these deserialized values.


    public abstract class Worker
    {
        public static Worker deserialize(ExecuteParameter executeParameter)
        { ... }
        }

        public ExecuteParameter serialize()
        { ... }

    }