We begin with models - "substitutes." These are models created to represent, within an information system, an image of some real-world entity. (Such models are based on the assumption that the object being modeled is entirely knowable and understood in all its aspects. Consequently, all modeling problems and defects are attributed solely to insufficient (superficial) familiarity with the object.)
Examples of "substitute" models:
- Representation of a Client in a CRM system
- Representation of a Client's Account in a Banking System
- A Vehicle in an Insurance Accounting System
Steps to be Addressed During the Modeling Phase
I. Decomposition How to divide the system being modeled and identify its main components (objects)?
II. Descriptiveness How to determine the necessary and sufficient attributes (and related functionalities) for each component to accurately represent the modeled system/entity within the information system?
III. Interdependencies (internal, inter-component, and external) How to formalize and categorize the existing relationships between components/objects, as well as between individual attributes of these components?
A Brief Note on Necessary and Sufficient Conditions
Let's consider this aspect based on attribute analysis of an entity.
- Necessary attributes are those that allow us to: a) Uniquely identify an entity instance b) Classify this entity (assign it to a particular category)
- Sufficient attributes are all those required to fulfill the specific functionality of the discussed information system.
For example, for a vehicle in the context of insurance:
- Identified via the VIN (not the license plate number or engine/chassis numbers, etc)
- Classified by purpose, body type, capacity, or payload capacity (according to industry and country-specific classifications)
- Year of manufacture, purchase date, manufacturer, engine power, initial cost, brand, additional equipment, damages (accidents), mileage, number of owners, estimated value at a specific date, warranty status, service plan presence, etc.
The list of sufficient attributes directly depends on how the insurance premium calculation is performed. (i.e. which of attributes are mandatory, and which are optional) Hence we can determine the threshold of sufficiency as follows:
- All attributes used in the calculation algorithms, excluding those that can be derived through computation or cross-referenced with reference data.
For example: The vehicle's depreciation category is a "derived" attribute calculated from:
- Manufacturer
- Model
- Year of manufacture
- Mileage based on a formula and reference data (which itself is a separate subject for consideration).
Decomposition of the Modeled System
Decomposition can be performed using various methods, but since we are discussing functional-structural models, the most common approach is functional decomposition.
This involves splitting the system into components, each possessing the following properties:
- A. They are systems of a lower level (e.g., in a car, components include the engine, suspension, wheels, brakes—not individual bolts or nuts).
- B. They realize, in a specific and unique way, a common property of the modeled system (e.g., the engine provides movement; it can be electric, gasoline, diesel, or steam).
- C. They have clearly defined boundaries to distinguish them within the modeled system.
Common Errors in Decomposition
- Incorrect selection of "depth" for decomposition: Over-decomposition complicates analysis and storage, leading to unnecessary costs. Under-decomposition ignores significant aspects of component functioning, degrading the overall system description. (For example, in a bank, it may suffice to store demographic data about a client, but in medical systems, detailed health information about all organs might be necessary.)
- Violation of logical and terminological consistency: Mixing different principles or methods applied to components, or inconsistency in terminology, units, or measurement methods. (For example, describing a car by combining functional and structural methods. It results in having mix of an engine and front-right, front-left, etc. wheels.)
Descriptiveness of Components and Systems
Each component (and the entire model) has a unique set of attributes that describe its significant characteristics, (assuming correct decomposition).
Attributes of systems and components are categorized into four types:
- Identifiers: Necessary for the unique identification of the component/system. Divided into the internal and external ones. External are being assigned at origination and are part of an external identifier maintained by external authority. Internal ones have a meaning ONLY within the modeling environment. (e.g., VIN, Account Number, ClientID, IBAN)
- Parameters: Measurable characteristics describing some functionality. Can be: binary, enumerated, continuous, or discrete. (e.g., vehicle mass (continuous), insect number of legs (discrete), car body color (enumerated), a house ownership status (binary)).
- Properties: Qualities affecting component/system operation, which can be binary, combinable, or mutually exclusive. (e.g., engine type: electric, gasoline, diesel, steam (mutually exclusive); bank account with overdraft protection or not (binary); a car with a 4-wheel drive (combinable properties)).
- Categories: Used for classification and categorization. These are typically derived attributes based on parameter/property combinations and classifiers. (e.g., biological classification: homo sapiens sapience: bipedal, intelligent, vertebrate, mammal, currently living.)
And one more important characteristic that MUST be taken into account: Variability
Values of attributes may change in different ways:
- Presumed as Immutable: Generally speaking, such attributes can be changed. But changes are to be considered as illegal—even if technically possible. (Example – a car’s VIN)
- Occasionally changing during the lifespan:
a) not causing re-classification (for example: passport number is changed any time when a person gets a new one.)
b) causing reclassification (e.g., increased engine wear leading to higher fuel consumption and maintenance costs).
3. Continuously changing during normal operations (business as usual): Changes depend on specific events (e.g., account balance after each transaction; when negative, the account becomes "overdrawn").
4. Traceable: All changes should be tracked and recorded, such as:
- in relation to the event initiator (e.g., balance after a transaction, or balance at the end of operational)
- at fixed time points (e.g., every 15 minutes or hourly)
- minimum, maximum, or average values (life, and/or over pre-defined time intervals).
Common Errors at defining attributes
- Mixing attributes and functional parameters: Attributes are data describing object parameters; functionality transforms inputs into outputs based on properties.
- Modeling static or slowly changing attributes as static: While all attribute values should be monitored, slow or static attributes are often modeled as constants, which can lead to missed unauthorized changes.
- Selecting not appropriate permanent storage method to capture changes. Thus statis and slowly changing may be treated as versioning, while for others may need to be used specialized databases (like "time-series" Db)
- Incorrect choice of change-tracking parameters: Often, changes are tracked relative to loading date or extraction time, ignoring the difference between registration and retrieval moments, or the need to track relative to other attributes.
- Logical inconsistencies at capturing of changing data: Capturing of all modifications should be synchronized; for example, merging near-real-time data with end-of-day data in a data product inevitable cause inconsistency.
Interdependencies These could be simple and complex, direct and indirect. For substitute models, there is no immediate need for in-depth study of these interdependencies, except regarding data quality aspects:
- Defining mandatory and dependent attributes (e.g., if the engine type is electric, then fuel efficiency cannot be determined by fuel consumption per 100 km).
- Ensuring logical consistency of recorded values (e.g., which account parameters must be registered at the end of an operational day, or which attributes should be recorded depending on the chosen penalty calculation method).
Errors related to interdependencies The most common issues involve oversimplifying interdependencies or making incorrect assumptions about the scope of influence of certain attributes on others. In the context of attribute value usage, it is also frequent to encounter violations of the well-known principle that Correlation is not causation. This often results from incorrect understanding of interdependencies within the modeled system/entity.
At this point, divergences begin between “modeling in general”, modeling of Data Products, and database design. These differences, along with criteria of models correctness will be discussed in the next post.