SlideShare a Scribd company logo
Case Management: Modeling Time Varying Relationships

Background

The Case Management System needs to be able to capture, store, and report upon the
relationships among a number of business entities of interest to the Company in its efforts
to track and respond to investigation related activities. These entities include questionable
Transactions, Accounts, Cases, Various Persons and Organizations, and Investigation
Management Issues.

It is not enough to know what the relationships among these entities are at one moment in
time, say the current moment. Investigators need to be able to look for patterns of
relationships through time. Criminals are not so accommodating as to make their
intentions clear. In many significant ways, a Case is the recording of a pattern of
changing relationships through a period of time.

Traditional data modeling represents the facts about a company as they exist at a point
in time. The databases designed from such models have the same characteristic.
Representing time varying relationships among facts requires the use of what are called
temporal or historical database techniques. This approach is very powerful in terms of
what can be recorded and reported upon; it is, however, considerably more complex than
normal business system database designs.

Purpose

The purpose of this paper is to document the specific design approach taken to modeling
time in a Case Management Application. This documentation will be of use to the user in
understanding “what is going on” in the application, as well as giving them a sense of
what kinds of time varying information can and cannot be stored in the Application. It
will also be useful to future technical personnel maintaining the application.

Technical Principles

The data design principles used in the Case Management System‟s database design are
based upon Peter Chen‟s Entity-Relational Model1, Codd‟s Extended Relational Model2,
Snodgrass‟ three levels of representing time in a relational database 3.

Considering Chen‟s work, data about the enterprise can be described as Entity-sets,
Relationship-sets, and Value-sets. Codd mapped these set theory ideas onto the
mathematics of relations; this relational interpretation of the Chen Entity-Relationship
model forms the basis of our database design in the Case Management System.

The Entity-sets and Relationship-sets are represented as tables. Each member of an
Entity-set or Relationship-set is a row in its table. The Value-set values that are
functionally mapped from the Entity-sets and Relationship sets in the Entity-Relational


By David Tryon                             Page 1                              November 1998
Case Management: Modeling Time Varying Relationships


Model become a value on a row in a table column. So far, this is standard data modeling
/ database design.

In the Case Management System, “Main Function” tables and “Lookup” tables
implement the Entity-sets; the “Assignment” tables implement the Relationship-sets.

Several of the Relationship-sets do not have explicit “Assignment” tables, but have their
relationships represented by an embedded “foreign key” in the core tables, specifically
the Action and Transaction entity‟s relationships to Account and Case. The reason for
this is that Action and Transaction represent events - point in time occurrences - whereas
Case, Account, Known Party, and Case Management Issues represent business objects or
“things” that have duration - exist for a span of time. For a more complete explanation,
read the endnote.4

Temporal / Historical Database Design

The Entity-Relationship Model represents a “snapshot” in time; the current view of the
known facts, the attributes of the Entity-sets and Relationship-sets. This “current
snapshot in time” perspective is also true of the databases developed from models. Even
“historical” facts captured in these databases represent the “current” time‟s point of view
about what was true at that point in time in the historical past being described. It
represents “what we know now about what was true then.”

A consequence of this “current” perspective is that historical facts are represented as
separate attributes from current facts and are implemented as two separate fields. If we
need to save all past values, the set of past values becomes its own table with each past
value represented as a row in the “historical” table. This approach is so standard that it is
not even thought of as odd. It would mean that if we wanted complete history on every
attribute, every attribute would become its own table.

For several decades, database theorists have been trying to deal with the whole issue of
time in a more elegant and less developer intensive way. There are three possible types
of time varying databases: Transaction, Historical, and Temporal.

Transaction Databases. In Transaction Time, the times when changes are made to values
stored in the database are recorded; regular database technology does this with logging
and recovery. These databases can “rollback” updates to an earlier point in time through a
“database recovery.” This form of time representation is seldom used in an Application.

Historical Databases. Historical databases explicitly record when each fact in the
database changes out in the “real world.” This means that when a change is recorded to a
value stored in the database, the time when the change actually occurred in the business
must also be recorded. This form of Historical Database is just beginning to become
available commercially in some high-end databases. An Historical Database has an
interesting effect on an application written against it. The idea of “current” goes away;
“current” becomes a matter of selection criteria. Retrieval with today‟s date is “current.”



By David Tryon                             Page 2                              November 1998
Case Management: Modeling Time Varying Relationships


Retrieval with last month‟s date is “historical.” Retrieval with next month‟s date is
“pending.”

Temporal Databases. Temporal databases explicitly records not only when each fact
changes in the “real world,” but also records when each fact was recorded in the
database. This means that not only can we retrieve the true “change” history; we can
retrieve a complete history of all “corrections” and “cancellations” ever made. This
database form combines both the Transaction and the Historical time capabilities. It
requires two time points for retrieval: the “As Of” time of the data and the “As Viewed
From” time for the report. Full temporal databases are an active research area, but are not
yet available as commercial products.

The Case Management System Application is implementing a limited version of a
Historical Database. We are capturing time history on the tables representing
Relationship-sets in the data model. These are the Assignment tables in the Application.
To accomplish this, we include a Begin Date and an End Date field in each Assignment
table row. The idea is when a relationship is established between two entities (what we
generally refer to an “an assignment”), the Begin Date is set to the effective date of the
“assignment.” Later, when the relationship is ended, we set the End Date to the
termination date. We do not delete the row from the Assignment table.

Retrieval of data for reports must always be sure to limit the selected Assignment rows to
those that were effective within the desired time-period.

Some thoughts about “Time”: Temporal Granularity, Points-in-time, and Time-
Period (Duration)

About here, readers must be asking themselves whether all this is necessary; after all,
we‟re just talking about a little desktop application, right? The truth is that we all use
these ideas on a day to day basis. We insist on clarity when discussing time-periods -
when filling in a time sheet for an hourly contractor, is the paid period through or until
the end time point? An hour‟s pay rides on the distinction. What is the unit of time to be
used - the temporal granularity? Is the work shift from 8 A.M. until 4 P.M. (through?)? Is
it from 8:00 A.M. until 4:00 P.M. (3:59 P.M.)? Do we pay for 8 hours or 8 hours and one
minute?

People are great at contextual reasoning! In case of doubt, a person would just say, “You
know, 8 hours pay” and the fuzziness would be resolved. Computers absolutely fail with
fuzzy situations, as do payroll clerks. So, we must be clear or the reports retrieved from
different points of view may not “fit together” too well. Anyone who has ever tried to
balance the dollars between a weekly payroll system and a monthly general ledger system
understands the consequences of incommensurate, temporal granularity.

In the Case Management System Application, time is recorded to the limits of the
Microsoft Date data type - some millionths of a second. Time reporting requests will be
(unless otherwise specified - such as when an Action occurred) in days.



By David Tryon                             Page 3                             November 1998
Case Management: Modeling Time Varying Relationships




Some reports are logically point-in-time oriented. An example would be a report to see
all “Currently Open Cases.” Others are inherently for a time-period, such as a “Cases
Opened During the Month.” In order to simplify and make the query process more
uniform, all queries will be stated as a duration with a Begin Date and an End Date. This
approach will work because we will assume all queries to be inclusive of the Begin Date
and the End Date. Therefore, a single point-in-time, a day, is represented when the Begin
Date and the End Date are equal.

An important comment about unspecified End Dates. When an assignment begins, the
End Date is usually not yet known. Therefore, rather than “kludging” the End Date field
with some arbitrarily high date value (for example, January 1, 2500) so that date
comparison logic will “work,” it has been decided to be correct and leave currently
“open” assignments as “Null.” There is no value assigned to the End Date field. This
means that retrieval logic must assume that a “Null” End Date is greater than any
specified Reporting End Date for comparison purposes.


Some practical consequences of the difference in the stored temporal granularity
(microsecond) and the retrieval temporal granularity (day).

The Report Begin Date must be converted to be midnight of the day the report is to occur
(e.g., 2/23/97 becomes 2/23/97 @ 00:00:00). The Report End Date must be converted to
one second before midnight at the end of the day (e.g., 2/27/97 becomes 2/27/97 @
23:59:59). This convention also supports the “point-in-time” convention of making the
Begin and End Dates the same to reference a single day. If the report were trying to
capture all the Investigative Actions taken on a given day, the correct (and complete) set
of Actions would be retrieved. “Now” type reports will use the Report End Date as the
“Now” so that the reports “as of a day” will be inclusive of anything the happen during
that day.

What Assignments Are “Valid” During a Time-Period?

If there are a number of assignment records being tested to see if they are to be selected
as part of the reporting time-period - a duration - what constitutes “being valid” during
the time period? For example, let us say we are selecting for the Month of February.
Does the assignment have to be valid for the entire month to be accepted? What is the
assignment becomes valid half way through the month, is it accepted? What if it is valid
at the beginning of the month and expires half way through. Is it accepted? What if it
becomes valid on the fifth and expires on the 21st?

In the Case Management System Application the standard assumption will be that any
transaction that was valid for any portion of a time period will be considered as valid
and selected. If we only want assignment valid on the last day of the month, select for
only the last day, not the entire month. If we only want transaction valid on the first and




By David Tryon                             Page 4                             November 1998
Case Management: Modeling Time Varying Relationships


still valid on the last day of the month, the selection must be explicitly defined in that
way through a custom query.

Standard Begin / End Date Query Logic

The issue will always come down to the precise structure of the Assignment Begin Date
and Assignment End Date comparisons with the Report Begin Date and Report End Date.

                                        Reporting                           Reporting
                                        Begin Date                          End Date
                                                       Report Period
  Data Validity Period

                                                      B1               E1
    1. Included

                                   B2                                                   E2 or NULL
    2. Includes

                                   B3                             E3
    3. Ends During

                                                      B4                                E4 or NULL
    4. Begins During

                              B5        E5
    5. Ends Before

                                                                                  B6    E6 or NULL
    6. Begins After




Given our definition of “included in the time period” above, there are several cases to be
considered. These alternatives are described in the table below.

As the reader can see, the inclusion rules get quite complex and compounded. However,
the exclusion rules are much simpler. They reduce to just two cases:

#5 - Case A: If the Data Begin Date is greater than the Report End Date.
#6 - Case B: If the Data End Date is less than the Report Begin Date.

If either of these conditions is true, the data is NOT selected. In all other cases, select the
data.

Logically, this situation nicely takes care of active, open Assignment records having a
Null Data End Date. Null in the comparison would force a false in Case B that is the
correct result because logically the Null represents a high-value date. Unfortunately,
there is a data type issue. It is necessary to separately test for the combination with the
Data End Date equal Null. This is annoying, but not an insurmountable obstacle.



By David Tryon                               Page 5                                 November 1998
Case Management: Modeling Time Varying Relationships


Implementing the Date Comparison Logic in SQL

In order to implement this in the Case Management System Application‟ database, the
prepositional logic above must be translated into the appropriate relational algebra
expressions, and from there into an SQL “WHERE” clause. As we are implementing this
application in Access, we will target this conversion in terms of the syntax of the Access
Query Design Grid.

The Access Query Design Grid expresses the components of the generated SQL
“WHERE” clauses as successive rows in the "Criteria" section. Each row in the criteria
section represents a set of conditions that are logically “AND‟ed” together. For a data
record to be selected it must pass all of the criteria conditions on a row. Multiple criteria
rows are inclusively “OR‟ed” together. This means that if a data record passes any row, it
is selected. It does not need to pass more than one row to be selected.

This syntax means that we must manipulate our acceptance criteria into the form of one
or more sets of “AND‟ed” conditions. That is the goal.

So, where do we start? We know from the previous section that we have two logical
tests, Case A and Case B. Furthermore we know that if a data record passes either test, it
is excluded. What we need is transformation to restructure the test in the form of
“(condition-1) AND (condition-2) AND (condition-3) AND ...” to fit in with the syntax
of the Access Query Design Grid.

Let us start by translating our exclusion criteria into inclusion criteria. Let us simplify
our notation to be that Case A is known simply as A and Case B is known simply as B.
The exclusion rule is:

        If A OR B = TRUE then exclude the record.

The opposite of this rule, the inclusion rule, is obtained by logically negating this
expression. In other words the inclusion rule is:

        If NOT(A OR B) = TRUE then include the record.

We now have an inclusion rule, but it is not in the “right” form for the Access Query
Design Grid. Negated compounds do not fit in the grid - or, for that matter, in a
“WHERE” clause. Fortunately there is a logical equivalence theorem - known to
programming types as the “propagation of „nots‟” - that states the following equivalence:

        NOT (A OR B) = (NOT A) AND (NOT B)

We replace this equivalent expression in our “inclusion” rule and it now reads:

        If (NOT A) AND (NOT B) = TRUE then include the record.




By David Tryon                              Page 6                              November 1998
Case Management: Modeling Time Varying Relationships


Note that this is now in the “right” syntax to be expressed in the Access Query Design
Grid. Let us recall where we are. Our compound rule fully expanded is:

        If (NOT Case A) AND (NOT Case B) = TRUE then include the record.
        Case A = Data Begin Date is greater than the Report End Date
        Case B = Data End Date is less than the Report Begin Date

Inequality logic comparisons are a little tricky. We must remember that:

        If X less than Y = TRUE then NOT (X greater than or equal Y) = TRUE

Therefore, Case A and Case B into this rule:

        NOT Case A = Data Begin Date is less than or equal the Report End Date
        NOT Case B = Data End Date is greater than or equal the Report Begin Date

Now, substituting these definition of NOT Case A and NOT Case B into the expanded
“inclusion” rule above we get:

        If (Data Begin Date is less than or equal the Report End Date)
                AND
           (Data End Date is greater than or equal the Report Begin Date)
        then include the record.

Now, the Access Query Design Grid has columns of each data element in the record. In
particular, it has columns for Data Begin Date and Data End Date. Therefore, the entry
in the Access Query Design Grid looks like the following:

                                    Data Begin Date             Data End Date

                 criteria         <= [Report End Date]     >=[Report Begin Date]

Because of the somewhat inelegant way that Access handles Nulls in data type specific
comparisons, we must explicitly include a second criteria to account for the possibly of
the Data End Type being Null. The Access Query Design Grid then looks like this:

                                    Data Begin Date             Data End Date

                 criteria         <= [Report End Date]     >=[Report Begin Date]
                 criteria         <= [Report End Date]           Is Null

The design technique used in the FDID Application is to replace the “Report Begin Date”
and “Report End Date” statements in the Access Query Design Grid with references to
the values of Controls on the Report Date Range Input form held open, but not visible,
during report creation. As long as all underlying queries for reports include this logic,



By David Tryon                              Page 7                           November 1998
Case Management: Modeling Time Varying Relationships


only the correct assignment records will be selected by the queries and submitted to the
report subsystem for report creation.




By David Tryon                            Page 8                             November 1998
Case Management: Modeling Time Varying Relationships


Endnotes
1
  Chen, P.P.; "The Entity-Relationship Model: A Basis for the Enterprise View of Data"; Proceedings of
the 1977 National Computer Conference; Dallas, Texas; AFIPS Conference Proceedings, Volume 46 and
Chen, P.P.; "The Entity-Relationship Model: Toward a Unified View of Data"; ACM Transactions on
Database Systems, Volume 1, Number 1; March 1976.
2
  Codd, E.F.; "Extending the Database Relational Model to Capture More Meaning"; SIGMOD
Proceedings; Boston, Mass.; June, 1979.
3
  http://www.cs.arizona.edu/people/rts/ and
  Snodgrass, Richard T.; Developing Time-Oriented Database Applications in SQL; Morgan Kaufmann,
San Francisco, California; 2000.
4
  “Thing” Entities versus “Event” Entities. Entity-sets can represent either “things or “events”. Events
differ from Things in that they exist only at a point in time; Things have duration. An Event is what it is
because it represents the recording of an action against some Thing that changes the state of the Thing. For
example, a Transaction is an action against the current balance of an Account. There are several important
points: 1) the Transaction Event occurs at a single point of time from the point of view of the acted upon
Thing, the Account. 2) Without the Thing, Account, the Transaction Event has no meaning - more
formally, the Transaction has both existence and identity dependence upon the Account. The first point
means that an Event Entity has no duration, not “history,” as it exists at only a single point of time. The
second point, the Event Entity having both existence and identity dependence upon the Thing Entity means
that it can be related to one and only one of the “Things” that gives it identity. This combination makes a
“time varying relationship” between the Event Entity and the Thing Entity impossible; if the relationship
were to vary over time, the identity of the Event Entity would also change. For example, we cannot
meaningfully talk about “changing” the Account a Transaction is related to. We can only “correct” a
mistaken assignment of a Transaction to the wrong Account. This is why when we “transfer” funds from
one Account to another, there are two Transactions - a debit to one Account and a credit to the other
Account. This inability to have a time varying relationship between an Event and its Thing allows us to
forego having a separate Assignment table representing the relationship between a Transaction and it
Account; we “bury” the Account‟s key in the Transaction as a required foreign key attribute. This also,
incidentally, further demonstrates the Transaction‟s identity dependency upon the Account. The same
reasoning applies to the other Event Entity-set in the Application, Action.




By David Tryon                                    Page 9                                    November 1998

More Related Content

PPTX
ER modeling
PDF
Data warehousing and business intelligence project report
PPT
My2dw
PDF
Data Warehouse Project Report
PPTX
Temporal databases
PPTX
Data Warehouse by Amr Ali
PDF
Data warehousing and machine learning primer
PPT
An introduction to data warehousing
ER modeling
Data warehousing and business intelligence project report
My2dw
Data Warehouse Project Report
Temporal databases
Data Warehouse by Amr Ali
Data warehousing and machine learning primer
An introduction to data warehousing

What's hot (20)

PDF
Data Warehouse Design & Dimensional Modeling
PPTX
Data Warehousing AWS 12345
PPTX
Dimensional Modeling
DOCX
Data warehouse Project Report
PPTX
Dimensional Modelling - Basic Concept
PPT
Warehouse components
PPTX
Introduction Data warehouse
PPTX
Dimensional Modeling Basic Concept with Example
PPT
Reports vs analysis
PDF
Business analytics and data warehousing
PDF
BigData Analytics_1.7
PDF
Data warehousing interview_questionsandanswers
PDF
A guide to preparing your data for tableau
PPT
Data mining
DOCX
Example data specifications and info requirements framework OVERVIEW
PPT
Part1
DOC
Dw hk-white paper
PPT
Data Mining
PPT
Data Quality Testing Generic (http://www.geektester.blogspot.com/)
PPT
Cssu dw dm
Data Warehouse Design & Dimensional Modeling
Data Warehousing AWS 12345
Dimensional Modeling
Data warehouse Project Report
Dimensional Modelling - Basic Concept
Warehouse components
Introduction Data warehouse
Dimensional Modeling Basic Concept with Example
Reports vs analysis
Business analytics and data warehousing
BigData Analytics_1.7
Data warehousing interview_questionsandanswers
A guide to preparing your data for tableau
Data mining
Example data specifications and info requirements framework OVERVIEW
Part1
Dw hk-white paper
Data Mining
Data Quality Testing Generic (http://www.geektester.blogspot.com/)
Cssu dw dm
Ad

Similar to Temporal Case Management 1998 (20)

PDF
Checking and verifying temporal data
PDF
Unit 5_ Advanced Database Models, Systems, and Applications.pdf
PPTX
data Modelling in Database introduction and design.pptx
PDF
BI-TEMPORAL IMPLEMENTATION IN RELATIONAL DATABASE MANAGEMENT SYSTEMS: MS SQ...
PDF
Date Analysis .pdf
PDF
Data Warehousing concepts for Data Engineering
PDF
Temporal database
PPT
Data warehouse
PPTX
1.2 CLASS-DW.pptx-data warehouse design and development
PDF
IRJET- Mining Frequent Itemset on Temporal data
PPTX
Temporal_Data_Warehouse.pptx
PDF
Database Systems Essay
PDF
Updating and Scheduling of Streaming Web Services in Data Warehouses
DOCX
Designing the business process dimensional model
PDF
The Structured Data Hub in 2019
DOC
ETL QA
PDF
Case study process mining with facility management data
DOCX
Dimensional data model
DOCX
Data miningvs datawarehouse
Checking and verifying temporal data
Unit 5_ Advanced Database Models, Systems, and Applications.pdf
data Modelling in Database introduction and design.pptx
BI-TEMPORAL IMPLEMENTATION IN RELATIONAL DATABASE MANAGEMENT SYSTEMS: MS SQ...
Date Analysis .pdf
Data Warehousing concepts for Data Engineering
Temporal database
Data warehouse
1.2 CLASS-DW.pptx-data warehouse design and development
IRJET- Mining Frequent Itemset on Temporal data
Temporal_Data_Warehouse.pptx
Database Systems Essay
Updating and Scheduling of Streaming Web Services in Data Warehouses
Designing the business process dimensional model
The Structured Data Hub in 2019
ETL QA
Case study process mining with facility management data
Dimensional data model
Data miningvs datawarehouse
Ad

Temporal Case Management 1998

  • 1. Case Management: Modeling Time Varying Relationships Background The Case Management System needs to be able to capture, store, and report upon the relationships among a number of business entities of interest to the Company in its efforts to track and respond to investigation related activities. These entities include questionable Transactions, Accounts, Cases, Various Persons and Organizations, and Investigation Management Issues. It is not enough to know what the relationships among these entities are at one moment in time, say the current moment. Investigators need to be able to look for patterns of relationships through time. Criminals are not so accommodating as to make their intentions clear. In many significant ways, a Case is the recording of a pattern of changing relationships through a period of time. Traditional data modeling represents the facts about a company as they exist at a point in time. The databases designed from such models have the same characteristic. Representing time varying relationships among facts requires the use of what are called temporal or historical database techniques. This approach is very powerful in terms of what can be recorded and reported upon; it is, however, considerably more complex than normal business system database designs. Purpose The purpose of this paper is to document the specific design approach taken to modeling time in a Case Management Application. This documentation will be of use to the user in understanding “what is going on” in the application, as well as giving them a sense of what kinds of time varying information can and cannot be stored in the Application. It will also be useful to future technical personnel maintaining the application. Technical Principles The data design principles used in the Case Management System‟s database design are based upon Peter Chen‟s Entity-Relational Model1, Codd‟s Extended Relational Model2, Snodgrass‟ three levels of representing time in a relational database 3. Considering Chen‟s work, data about the enterprise can be described as Entity-sets, Relationship-sets, and Value-sets. Codd mapped these set theory ideas onto the mathematics of relations; this relational interpretation of the Chen Entity-Relationship model forms the basis of our database design in the Case Management System. The Entity-sets and Relationship-sets are represented as tables. Each member of an Entity-set or Relationship-set is a row in its table. The Value-set values that are functionally mapped from the Entity-sets and Relationship sets in the Entity-Relational By David Tryon Page 1 November 1998
  • 2. Case Management: Modeling Time Varying Relationships Model become a value on a row in a table column. So far, this is standard data modeling / database design. In the Case Management System, “Main Function” tables and “Lookup” tables implement the Entity-sets; the “Assignment” tables implement the Relationship-sets. Several of the Relationship-sets do not have explicit “Assignment” tables, but have their relationships represented by an embedded “foreign key” in the core tables, specifically the Action and Transaction entity‟s relationships to Account and Case. The reason for this is that Action and Transaction represent events - point in time occurrences - whereas Case, Account, Known Party, and Case Management Issues represent business objects or “things” that have duration - exist for a span of time. For a more complete explanation, read the endnote.4 Temporal / Historical Database Design The Entity-Relationship Model represents a “snapshot” in time; the current view of the known facts, the attributes of the Entity-sets and Relationship-sets. This “current snapshot in time” perspective is also true of the databases developed from models. Even “historical” facts captured in these databases represent the “current” time‟s point of view about what was true at that point in time in the historical past being described. It represents “what we know now about what was true then.” A consequence of this “current” perspective is that historical facts are represented as separate attributes from current facts and are implemented as two separate fields. If we need to save all past values, the set of past values becomes its own table with each past value represented as a row in the “historical” table. This approach is so standard that it is not even thought of as odd. It would mean that if we wanted complete history on every attribute, every attribute would become its own table. For several decades, database theorists have been trying to deal with the whole issue of time in a more elegant and less developer intensive way. There are three possible types of time varying databases: Transaction, Historical, and Temporal. Transaction Databases. In Transaction Time, the times when changes are made to values stored in the database are recorded; regular database technology does this with logging and recovery. These databases can “rollback” updates to an earlier point in time through a “database recovery.” This form of time representation is seldom used in an Application. Historical Databases. Historical databases explicitly record when each fact in the database changes out in the “real world.” This means that when a change is recorded to a value stored in the database, the time when the change actually occurred in the business must also be recorded. This form of Historical Database is just beginning to become available commercially in some high-end databases. An Historical Database has an interesting effect on an application written against it. The idea of “current” goes away; “current” becomes a matter of selection criteria. Retrieval with today‟s date is “current.” By David Tryon Page 2 November 1998
  • 3. Case Management: Modeling Time Varying Relationships Retrieval with last month‟s date is “historical.” Retrieval with next month‟s date is “pending.” Temporal Databases. Temporal databases explicitly records not only when each fact changes in the “real world,” but also records when each fact was recorded in the database. This means that not only can we retrieve the true “change” history; we can retrieve a complete history of all “corrections” and “cancellations” ever made. This database form combines both the Transaction and the Historical time capabilities. It requires two time points for retrieval: the “As Of” time of the data and the “As Viewed From” time for the report. Full temporal databases are an active research area, but are not yet available as commercial products. The Case Management System Application is implementing a limited version of a Historical Database. We are capturing time history on the tables representing Relationship-sets in the data model. These are the Assignment tables in the Application. To accomplish this, we include a Begin Date and an End Date field in each Assignment table row. The idea is when a relationship is established between two entities (what we generally refer to an “an assignment”), the Begin Date is set to the effective date of the “assignment.” Later, when the relationship is ended, we set the End Date to the termination date. We do not delete the row from the Assignment table. Retrieval of data for reports must always be sure to limit the selected Assignment rows to those that were effective within the desired time-period. Some thoughts about “Time”: Temporal Granularity, Points-in-time, and Time- Period (Duration) About here, readers must be asking themselves whether all this is necessary; after all, we‟re just talking about a little desktop application, right? The truth is that we all use these ideas on a day to day basis. We insist on clarity when discussing time-periods - when filling in a time sheet for an hourly contractor, is the paid period through or until the end time point? An hour‟s pay rides on the distinction. What is the unit of time to be used - the temporal granularity? Is the work shift from 8 A.M. until 4 P.M. (through?)? Is it from 8:00 A.M. until 4:00 P.M. (3:59 P.M.)? Do we pay for 8 hours or 8 hours and one minute? People are great at contextual reasoning! In case of doubt, a person would just say, “You know, 8 hours pay” and the fuzziness would be resolved. Computers absolutely fail with fuzzy situations, as do payroll clerks. So, we must be clear or the reports retrieved from different points of view may not “fit together” too well. Anyone who has ever tried to balance the dollars between a weekly payroll system and a monthly general ledger system understands the consequences of incommensurate, temporal granularity. In the Case Management System Application, time is recorded to the limits of the Microsoft Date data type - some millionths of a second. Time reporting requests will be (unless otherwise specified - such as when an Action occurred) in days. By David Tryon Page 3 November 1998
  • 4. Case Management: Modeling Time Varying Relationships Some reports are logically point-in-time oriented. An example would be a report to see all “Currently Open Cases.” Others are inherently for a time-period, such as a “Cases Opened During the Month.” In order to simplify and make the query process more uniform, all queries will be stated as a duration with a Begin Date and an End Date. This approach will work because we will assume all queries to be inclusive of the Begin Date and the End Date. Therefore, a single point-in-time, a day, is represented when the Begin Date and the End Date are equal. An important comment about unspecified End Dates. When an assignment begins, the End Date is usually not yet known. Therefore, rather than “kludging” the End Date field with some arbitrarily high date value (for example, January 1, 2500) so that date comparison logic will “work,” it has been decided to be correct and leave currently “open” assignments as “Null.” There is no value assigned to the End Date field. This means that retrieval logic must assume that a “Null” End Date is greater than any specified Reporting End Date for comparison purposes. Some practical consequences of the difference in the stored temporal granularity (microsecond) and the retrieval temporal granularity (day). The Report Begin Date must be converted to be midnight of the day the report is to occur (e.g., 2/23/97 becomes 2/23/97 @ 00:00:00). The Report End Date must be converted to one second before midnight at the end of the day (e.g., 2/27/97 becomes 2/27/97 @ 23:59:59). This convention also supports the “point-in-time” convention of making the Begin and End Dates the same to reference a single day. If the report were trying to capture all the Investigative Actions taken on a given day, the correct (and complete) set of Actions would be retrieved. “Now” type reports will use the Report End Date as the “Now” so that the reports “as of a day” will be inclusive of anything the happen during that day. What Assignments Are “Valid” During a Time-Period? If there are a number of assignment records being tested to see if they are to be selected as part of the reporting time-period - a duration - what constitutes “being valid” during the time period? For example, let us say we are selecting for the Month of February. Does the assignment have to be valid for the entire month to be accepted? What is the assignment becomes valid half way through the month, is it accepted? What if it is valid at the beginning of the month and expires half way through. Is it accepted? What if it becomes valid on the fifth and expires on the 21st? In the Case Management System Application the standard assumption will be that any transaction that was valid for any portion of a time period will be considered as valid and selected. If we only want assignment valid on the last day of the month, select for only the last day, not the entire month. If we only want transaction valid on the first and By David Tryon Page 4 November 1998
  • 5. Case Management: Modeling Time Varying Relationships still valid on the last day of the month, the selection must be explicitly defined in that way through a custom query. Standard Begin / End Date Query Logic The issue will always come down to the precise structure of the Assignment Begin Date and Assignment End Date comparisons with the Report Begin Date and Report End Date. Reporting Reporting Begin Date End Date Report Period Data Validity Period B1 E1 1. Included B2 E2 or NULL 2. Includes B3 E3 3. Ends During B4 E4 or NULL 4. Begins During B5 E5 5. Ends Before B6 E6 or NULL 6. Begins After Given our definition of “included in the time period” above, there are several cases to be considered. These alternatives are described in the table below. As the reader can see, the inclusion rules get quite complex and compounded. However, the exclusion rules are much simpler. They reduce to just two cases: #5 - Case A: If the Data Begin Date is greater than the Report End Date. #6 - Case B: If the Data End Date is less than the Report Begin Date. If either of these conditions is true, the data is NOT selected. In all other cases, select the data. Logically, this situation nicely takes care of active, open Assignment records having a Null Data End Date. Null in the comparison would force a false in Case B that is the correct result because logically the Null represents a high-value date. Unfortunately, there is a data type issue. It is necessary to separately test for the combination with the Data End Date equal Null. This is annoying, but not an insurmountable obstacle. By David Tryon Page 5 November 1998
  • 6. Case Management: Modeling Time Varying Relationships Implementing the Date Comparison Logic in SQL In order to implement this in the Case Management System Application‟ database, the prepositional logic above must be translated into the appropriate relational algebra expressions, and from there into an SQL “WHERE” clause. As we are implementing this application in Access, we will target this conversion in terms of the syntax of the Access Query Design Grid. The Access Query Design Grid expresses the components of the generated SQL “WHERE” clauses as successive rows in the "Criteria" section. Each row in the criteria section represents a set of conditions that are logically “AND‟ed” together. For a data record to be selected it must pass all of the criteria conditions on a row. Multiple criteria rows are inclusively “OR‟ed” together. This means that if a data record passes any row, it is selected. It does not need to pass more than one row to be selected. This syntax means that we must manipulate our acceptance criteria into the form of one or more sets of “AND‟ed” conditions. That is the goal. So, where do we start? We know from the previous section that we have two logical tests, Case A and Case B. Furthermore we know that if a data record passes either test, it is excluded. What we need is transformation to restructure the test in the form of “(condition-1) AND (condition-2) AND (condition-3) AND ...” to fit in with the syntax of the Access Query Design Grid. Let us start by translating our exclusion criteria into inclusion criteria. Let us simplify our notation to be that Case A is known simply as A and Case B is known simply as B. The exclusion rule is: If A OR B = TRUE then exclude the record. The opposite of this rule, the inclusion rule, is obtained by logically negating this expression. In other words the inclusion rule is: If NOT(A OR B) = TRUE then include the record. We now have an inclusion rule, but it is not in the “right” form for the Access Query Design Grid. Negated compounds do not fit in the grid - or, for that matter, in a “WHERE” clause. Fortunately there is a logical equivalence theorem - known to programming types as the “propagation of „nots‟” - that states the following equivalence: NOT (A OR B) = (NOT A) AND (NOT B) We replace this equivalent expression in our “inclusion” rule and it now reads: If (NOT A) AND (NOT B) = TRUE then include the record. By David Tryon Page 6 November 1998
  • 7. Case Management: Modeling Time Varying Relationships Note that this is now in the “right” syntax to be expressed in the Access Query Design Grid. Let us recall where we are. Our compound rule fully expanded is: If (NOT Case A) AND (NOT Case B) = TRUE then include the record. Case A = Data Begin Date is greater than the Report End Date Case B = Data End Date is less than the Report Begin Date Inequality logic comparisons are a little tricky. We must remember that: If X less than Y = TRUE then NOT (X greater than or equal Y) = TRUE Therefore, Case A and Case B into this rule: NOT Case A = Data Begin Date is less than or equal the Report End Date NOT Case B = Data End Date is greater than or equal the Report Begin Date Now, substituting these definition of NOT Case A and NOT Case B into the expanded “inclusion” rule above we get: If (Data Begin Date is less than or equal the Report End Date) AND (Data End Date is greater than or equal the Report Begin Date) then include the record. Now, the Access Query Design Grid has columns of each data element in the record. In particular, it has columns for Data Begin Date and Data End Date. Therefore, the entry in the Access Query Design Grid looks like the following: Data Begin Date Data End Date criteria <= [Report End Date] >=[Report Begin Date] Because of the somewhat inelegant way that Access handles Nulls in data type specific comparisons, we must explicitly include a second criteria to account for the possibly of the Data End Type being Null. The Access Query Design Grid then looks like this: Data Begin Date Data End Date criteria <= [Report End Date] >=[Report Begin Date] criteria <= [Report End Date] Is Null The design technique used in the FDID Application is to replace the “Report Begin Date” and “Report End Date” statements in the Access Query Design Grid with references to the values of Controls on the Report Date Range Input form held open, but not visible, during report creation. As long as all underlying queries for reports include this logic, By David Tryon Page 7 November 1998
  • 8. Case Management: Modeling Time Varying Relationships only the correct assignment records will be selected by the queries and submitted to the report subsystem for report creation. By David Tryon Page 8 November 1998
  • 9. Case Management: Modeling Time Varying Relationships Endnotes 1 Chen, P.P.; "The Entity-Relationship Model: A Basis for the Enterprise View of Data"; Proceedings of the 1977 National Computer Conference; Dallas, Texas; AFIPS Conference Proceedings, Volume 46 and Chen, P.P.; "The Entity-Relationship Model: Toward a Unified View of Data"; ACM Transactions on Database Systems, Volume 1, Number 1; March 1976. 2 Codd, E.F.; "Extending the Database Relational Model to Capture More Meaning"; SIGMOD Proceedings; Boston, Mass.; June, 1979. 3 http://www.cs.arizona.edu/people/rts/ and Snodgrass, Richard T.; Developing Time-Oriented Database Applications in SQL; Morgan Kaufmann, San Francisco, California; 2000. 4 “Thing” Entities versus “Event” Entities. Entity-sets can represent either “things or “events”. Events differ from Things in that they exist only at a point in time; Things have duration. An Event is what it is because it represents the recording of an action against some Thing that changes the state of the Thing. For example, a Transaction is an action against the current balance of an Account. There are several important points: 1) the Transaction Event occurs at a single point of time from the point of view of the acted upon Thing, the Account. 2) Without the Thing, Account, the Transaction Event has no meaning - more formally, the Transaction has both existence and identity dependence upon the Account. The first point means that an Event Entity has no duration, not “history,” as it exists at only a single point of time. The second point, the Event Entity having both existence and identity dependence upon the Thing Entity means that it can be related to one and only one of the “Things” that gives it identity. This combination makes a “time varying relationship” between the Event Entity and the Thing Entity impossible; if the relationship were to vary over time, the identity of the Event Entity would also change. For example, we cannot meaningfully talk about “changing” the Account a Transaction is related to. We can only “correct” a mistaken assignment of a Transaction to the wrong Account. This is why when we “transfer” funds from one Account to another, there are two Transactions - a debit to one Account and a credit to the other Account. This inability to have a time varying relationship between an Event and its Thing allows us to forego having a separate Assignment table representing the relationship between a Transaction and it Account; we “bury” the Account‟s key in the Transaction as a required foreign key attribute. This also, incidentally, further demonstrates the Transaction‟s identity dependency upon the Account. The same reasoning applies to the other Event Entity-set in the Application, Action. By David Tryon Page 9 November 1998