How to have the monitoring of the days on the
data files of a Data Warehouse
Recipes of Data Warehouse and Business Intelligence
Are you the right one ? Have you what I
expect ?
Have you lossed
some piece ?
DATA
FILE
• In this article we focus on the management of the loading day of the data file, the
reference day of the data, and the expected number of rows. These issues have
already been covered briefly in some of my previous articles published on
slideshare and on my blog. Now we see the practical application.
• How real case, we will use, as an example, the data file of MTF markets
(Multilateral Trading Facilities). To the data file has been associated a "row" file that
contains, within it, the number of rows expected in the data file itself.
• The control file, created by hand to this end, is composed of three lines:
#MTF CONTROL FILE OF 20160314
ROWS = 160
#END OF MTF CONTROL FILE OF 20160314
• We suppose that the data file should arrive every working day, and the reference
day is the previous working day.
• The reference day is specified in the file name, but we must be careful, because the
feeding system sets, as reference, the day of production of the data file and not the
previous working day.
The use case
• Based on the information mentioned above, to
get the full control of the data file loading, the
ETL system should provide me all the
information necessary to fulfill the following
requirements.
• We must have a clear vision of what are the
characteristics of the data file, both general
and purely technical nature. In particular,
those linked to its name, the file structure, the
way it is defined the reference day, the
structure of the control file (if present)
• So, we will define the temporal characteristics
of the data file by using a code that represents
its management.
The control requirements
• For convenience, I summarize the ways in which the feeding system can tell me the
reference day.
The control requirements
A column of
data file
Inside the
data file
Where is the
reference
day of data ?
In the heading
of data file
In the tail of
data file
In the name
of data file
Missing, assume
the system date
Outside the
data file
• We must have a clear vision of what is the internal structure of the data file, ie
what are the columns that constitute it. And for each column must be present as
many as possible metadata.
• Both static, such as the type or length, that dynamic, as the presence of a domain
of values, or if the column is part of the unique key.
The control requirements
The control requirements
• We must have a calendar table, that,
for each calendar day, tell me, simply
duplicating the day, if I expect the
arrival of the data file and what is the
expected reference day in the data file
of that day.
• If the data file contains more days, I
need to know what is the range of days
that I expect.
The control requirements
• We need to know the final outcome of the processing. The final state and the time
taken. If the upload has had problems, I need to know the error produced, and what
is the programming module that generated it.
• If the outcome is negative, we have to know exactly why you are in error. For
example, if the consistency check has failed, I need to know at what point it
occurred.
The control requirements
• We need to know the final outcome of the control about the loading day and the
reference day.
• To get the final outcome of the controls, we have to think about implementing a
control logic similar to that shown in the next figure.
• Dark green definitely the correct situations. In red, the alert situations. In light green,
the ones presumably correct but that require attention.
The control requirements
1 – OK
(arrived and right day)
Expected day = reference
day ?
It had
to arrive ?
Data file
is arrived ?
2 - NOT OK
( arrived but wrong day)
3 - OK
(unespected file)
4 - NOT OK
(unespected file and
wrong day)
5 - OK
(maybe file)
6 - NOT OK
(maybe file and wrong
day)
7 - NOT OK
(missing file)
8 – OK
(no file to load)
9 - OK
(maybe file)
Expected day = reference
day ?
Expected day = reference
day ?
It had
to arrive ?
yes
no
maybe
yes
no
maybe
yes
yes
yes
yes
no
no
no
no
The control requirements
• We must have via e-mail the result of processing.
• Using the Micro ETL Foundation we can handle this situation and its control in a few
steps.
MEF:
Open the link:
https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k
Go to the Mef_v2 folder and follow the instructions of the readme file.
The data file is in the folder .. dat and is called mtf_export_20160314.csv. The control file with the expected number of rows is
called mtf_export_20160314.row.
It is present in the .. dat
The file that configures the data file fields is located in the .. cft and is called mtf.csv
The configuration of the data and control file
• The first step is to insert into a configuration table, which we will call IO_CFT for
brevity, all the information that we know about the features of the data file that we
load. Also, for this case, you need to enter in the IO_CFT table also information
relating to the control file.
• The second step is to insert in the IO_CFT table, the information relative to the
expected day of arrival of the data file. We must define a code, let's call FR_COD (File
Reference Code) behind which there will be the load logic of a second configuration
table that we will call IODAY_CFT. The FR_COD code represents the arrival frequency.
For the moment, I have defined some commonly used values :
• AD = Every day. It means that the data file must arrive every day. So, in
IODAY_CFT table, they will be setted all the days.
• AWD = All working days. It means that the data file must only arrive on the
working days. So all holidays most Saturdays and Sundays will be null.
• ? = I do not know when it comes, it is variable. Typical of monthly flows of which
no one knows precisely when available.
• Based on the FR_COD code, the IODAY_CFT table will be loaded, by setting the
presence of the expected day in the FR_YMD field.
Reference day configuration
• The third step is to insert in the IO_CFT table, information relating to the expected
reference day.
• The DR_COD code must indicate what should be the reference day for data in the
data file. I remember that the reference day must be present or implied. The same
logic has been applied to FR_COD field also applies to DR_COD field. It will serve to
set the IODAY_CFT. For the moment I have defined some commonly used values:
• 0 = the reference date coincides with the current day.
• 1 = the reference date coincides with the day before, that is, the current -1
• 1W = indicates the first preceding business day.
• The configuration tasks of the IODAY_CFT table occurs only once in the process of
the data file configuration. After, you no longer need to change.
• Note that the use of the codes is a way to quickly facilitate the setting of the
IODAY_CFT table. Nobody blocks you, to manually modify the table or with ad-hoc
SQL.
Configuration of the correction factor
• The OFF_COD code present in IO_CFT indicates the correction factor to be applied to
the reference day indicated by the feeding system. The OFF_COD does not act in
control, but will act as a corrector of the day at run-time. For the moment I have
defined some commonly used codes:
• 0 = the reference day coincides with the day indicated by the feeding system.
• 1 = the reference day coincides with the day before, that is, the current -1
• 1W = the reference date coincides with the previous working day.
• The FROM_DR_YMD and TO_DR_YMD fields have the same meaning of the FR_COD
field, but allow you to identify a range of possible reference days. For the moment,
only one code has been defined
• PM = the previous month of the current calendar day.
MEF:
The data file is in the folder .. dat and is called mtf_export_20160314.csv.
The control file with the expected number of rows is called mtf_export_20160314.row. It is present in the .. dat
The file that configures the file data field structure is located in the .. cft and is called mtf.csv
The configuration file of the data file is called io_mtf.txt and is under the folder .. cft. It has the following settings:
The configuration file
IO_COD: MTF (file identificator)
IO_DEB: Multilateral Trading Facilities (file description)
TYPE_COD: FIN (file type - input file)
SEC_COD: ESM (feeding system: ESMA)
FRQ_COD: D (frequency - Daily)
FILE_LIKE_TXT: mtf_export% .csv (generic name of the file without day)
FILE_EXT_TXT: mtf_export_20160314.csv (name of the sample data file)
HOST_NC:., (Priority on the decimal point)
HEAD_CNT: 1 (number of rows in header)
FOO_CNT: 0 (number of rows in tail)
SEP_TXT :, (separator symbol if csv)
START_NUM: 12 (starting character of the day in the name)
SIZE_NUM: 8 (size of day)
RROW_NUM: 2 (row of the control file in which there is the file rows number)
RSTART_NUM: 8 (where begins the number of rows)
RSIZE_NUM: 6 (size of the number)
MASK_TXT: YYYYMMDD (format of the day)
FR_COD: AWD (file reference code)
DR_COD: 1W (day reference code)
OFF_COD: 1W (offset on day reference)
RCF_LIKE_TXT: mtf_export% .row (generic name of control file without day)
RCF_EXT_TXT: mtf_export_20160314.row (name of the sample control file)
FTB_TXT: NEWLINE (indicator of the row end for the Oracle external table)
TRUNC_COD: 1 (indicating whether the staging table should be truncated before loading)
NOTE_IO_COD: MTF (presence of a notes file)
The configuration file
MEF:
The DR_COD code is managed by the mef_sta_build.p_dr_cod function
The FR_COD code is managed by the mef_sta_build.p_fr_cod function
The OFF_COD code is managed by mef_sta.f_off_cod function. See further detail in Recipe 12 on Slideshare
The functions that handle the day range are mef_sta_build.p_from_dr_cod and mef_sta_build.p_to_dr_cod.
In this way, by changing the functions we can define other codes. The mef_sta_build.p_objday_cft will load the IODAY_CFT table.
The complete configuration of the data file is done by launching the procedure
SQL> @sta_conf_io MTF
The data file loading
• The process of loading of the data file, must insert in a log table the information
related to the elaboration day and to the reference day received from the feeding
system.
MEF:
SQL> exec mef_job.p_run('sta_esm_mtf');
• Comparing, at the end of loading, what is configured with what is loaded, we can
infer a final outcome of the process. This comparison may be displayed by means of
a view which we will call IODAY_CFV.
• The logic with which works the view was summarized in a previous figure. On the
basis of this outcome, it must be agreed upon an intervention strategy.
• In our example, launched on a working day, we see that there is a problem related to
the reference day.
• Also there is another problem to be investigated: the number of rows declared in the
control file is different from the number of rows loaded.
Conclusion
• Whatever way we implement an ETL solution, the important point to emphasize is
that we need to know before, the time characteristics of the data file that we will
load.
• For each calendar day, we must have clear what I expect to receive on that day and,
for any given data file, what is the reference day that I expect to find inside.
• There can be no doubt or ambiguity: is information that we need to know in advance
and we have to configure. After the loading of the Staging Area, only the comparison
between what we expected to receive with what we actually received, will allow us
to evaluate the correctness of the loaded data.
• It ' just remember that this correctness check is a priority, is the first check, and it
refers only to the two time components of the data. Only if these checks are positive,
it will make sense to continue with the other quality controls.
References
On Slideshare:
the series: Recipes of Data Warehouse and Business Intelligence.
Blog:
http://microetlfoundation.blogspot.it
http://massimocenci.blogspot.it/
Micro ETL Foundation free source at:
https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k
Last version v2.
Email:
massimo_cenci@yahoo.it

More Related Content

PPTX
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
PPTX
Data Warehouse and Business Intelligence - Recipe 2
PPTX
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
PPTX
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
PPT
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
PPTX
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
PPTX
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
PPTX
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Data Warehouse and Business Intelligence - Recipe 2
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...

What's hot (18)

PPT
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
PPTX
Data Warehouse and Business Intelligence - Recipe 1
PPTX
Data Warehouse and Business Intelligence - Recipe 3
PPTX
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
PDF
Oracle DBA interview_questions
PDF
Working with the IFS on System i
PDF
Sql introduction
PPTX
New T-SQL Features in SQL Server 2012
PDF
Top 100 SQL Interview Questions and Answers
DOCX
Sql loader good example
DOCX
Oracle sql loader utility
DOC
DOCX
Multiple files single target single interface
DOC
Steps for upgrading the database to 10g release 2
DOCX
Dbm 438 Enthusiastic Study / snaptutorial.com
DOCX
Convert language latin1 to utf8 on mysql
PPTX
MySQL Replication Evolution -- Confoo Montreal 2017
PDF
SQL2SPARQL
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 3
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Oracle DBA interview_questions
Working with the IFS on System i
Sql introduction
New T-SQL Features in SQL Server 2012
Top 100 SQL Interview Questions and Answers
Sql loader good example
Oracle sql loader utility
Multiple files single target single interface
Steps for upgrading the database to 10g release 2
Dbm 438 Enthusiastic Study / snaptutorial.com
Convert language latin1 to utf8 on mysql
MySQL Replication Evolution -- Confoo Montreal 2017
SQL2SPARQL
Ad

Viewers also liked (7)

PDF
May 29, 2014 Toronto Hadoop User Group - Micro ETL
PPTX
Il controllo temporale dei data file in staging area
PPTX
Basic of Oracle Application
PPTX
Tecniche di progettazione della staging area in un processo etl
PPTX
Design Principles for a Modern Data Warehouse
PDF
Architecting a Data Warehouse: A Case Study
PPTX
Building an Effective Data Warehouse Architecture
May 29, 2014 Toronto Hadoop User Group - Micro ETL
Il controllo temporale dei data file in staging area
Basic of Oracle Application
Tecniche di progettazione della staging area in un processo etl
Design Principles for a Modern Data Warehouse
Architecting a Data Warehouse: A Case Study
Building an Effective Data Warehouse Architecture
Ad

Similar to Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse (20)

DOCX
Final Project Write-up
PPTX
Assignment of database
PPTX
data warehousing need and characteristics. types of data w data warehouse arc...
PPTX
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
PPTX
Database Design
PPTX
Large Data Volume Salesforce experiences
PDF
Unit 5_ Advanced Database Models, Systems, and Applications.pdf
DOC
Informatica and datawarehouse Material
PPTX
Data Warehouse for data analytics presentation
DOCX
Replace this Line with the Title of Your Paper.docx
DOCX
CIS 336 STUDY Introduction Education--cis336study.com
PDF
ETL Process & Data Warehouse Fundamentals
DOCX
Page 18Goal Implement a complete search engine. Milestones.docx
DOC
Data warehouse concepts
PPTX
1.3 CLASS-DW.pptx-ETL process in details with detailed descriptions
PPTX
BM322_05.pptxBusiness Management Integral University
PDF
ITFT- Dbms
PPTX
System design
PPTX
Data Warehouse - What you know about etl process is wrong
PPT
algo 1.ppt
Final Project Write-up
Assignment of database
data warehousing need and characteristics. types of data w data warehouse arc...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Database Design
Large Data Volume Salesforce experiences
Unit 5_ Advanced Database Models, Systems, and Applications.pdf
Informatica and datawarehouse Material
Data Warehouse for data analytics presentation
Replace this Line with the Title of Your Paper.docx
CIS 336 STUDY Introduction Education--cis336study.com
ETL Process & Data Warehouse Fundamentals
Page 18Goal Implement a complete search engine. Milestones.docx
Data warehouse concepts
1.3 CLASS-DW.pptx-ETL process in details with detailed descriptions
BM322_05.pptxBusiness Management Integral University
ITFT- Dbms
System design
Data Warehouse - What you know about etl process is wrong
algo 1.ppt

More from Massimo Cenci (12)

PPTX
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
PPTX
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
PPTX
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
PPTX
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
PPTX
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
PPTX
Letter to a programmer
PPTX
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
PPTX
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
PPTX
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
PPT
Oracle All-in-One - how to send mail with attach using oracle pl/sql
PPTX
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
PPTX
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Letter to a programmer
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi

Recently uploaded (20)

DOCX
search engine optimization ppt fir known well about this
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
Five Habits of High-Impact Board Members
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
Tartificialntelligence_presentation.pptx
PPT
Geologic Time for studying geology for geologist
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
Unlock new opportunities with location data.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Enhancing emotion recognition model for a student engagement use case through...
search engine optimization ppt fir known well about this
A novel scalable deep ensemble learning framework for big data classification...
sustainability-14-14877-v2.pddhzftheheeeee
Web Crawler for Trend Tracking Gen Z Insights.pptx
Five Habits of High-Impact Board Members
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
observCloud-Native Containerability and monitoring.pptx
Tartificialntelligence_presentation.pptx
Geologic Time for studying geology for geologist
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Hybrid model detection and classification of lung cancer
Unlock new opportunities with location data.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Module 1.ppt Iot fundamentals and Architecture
Getting started with AI Agents and Multi-Agent Systems
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
CloudStack 4.21: First Look Webinar slides
Enhancing emotion recognition model for a student engagement use case through...

Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the days on the data files of a Data Warehouse

  • 1. How to have the monitoring of the days on the data files of a Data Warehouse Recipes of Data Warehouse and Business Intelligence Are you the right one ? Have you what I expect ? Have you lossed some piece ? DATA FILE
  • 2. • In this article we focus on the management of the loading day of the data file, the reference day of the data, and the expected number of rows. These issues have already been covered briefly in some of my previous articles published on slideshare and on my blog. Now we see the practical application. • How real case, we will use, as an example, the data file of MTF markets (Multilateral Trading Facilities). To the data file has been associated a "row" file that contains, within it, the number of rows expected in the data file itself. • The control file, created by hand to this end, is composed of three lines: #MTF CONTROL FILE OF 20160314 ROWS = 160 #END OF MTF CONTROL FILE OF 20160314 • We suppose that the data file should arrive every working day, and the reference day is the previous working day. • The reference day is specified in the file name, but we must be careful, because the feeding system sets, as reference, the day of production of the data file and not the previous working day. The use case
  • 3. • Based on the information mentioned above, to get the full control of the data file loading, the ETL system should provide me all the information necessary to fulfill the following requirements. • We must have a clear vision of what are the characteristics of the data file, both general and purely technical nature. In particular, those linked to its name, the file structure, the way it is defined the reference day, the structure of the control file (if present) • So, we will define the temporal characteristics of the data file by using a code that represents its management. The control requirements
  • 4. • For convenience, I summarize the ways in which the feeding system can tell me the reference day. The control requirements A column of data file Inside the data file Where is the reference day of data ? In the heading of data file In the tail of data file In the name of data file Missing, assume the system date Outside the data file
  • 5. • We must have a clear vision of what is the internal structure of the data file, ie what are the columns that constitute it. And for each column must be present as many as possible metadata. • Both static, such as the type or length, that dynamic, as the presence of a domain of values, or if the column is part of the unique key. The control requirements
  • 6. The control requirements • We must have a calendar table, that, for each calendar day, tell me, simply duplicating the day, if I expect the arrival of the data file and what is the expected reference day in the data file of that day. • If the data file contains more days, I need to know what is the range of days that I expect.
  • 7. The control requirements • We need to know the final outcome of the processing. The final state and the time taken. If the upload has had problems, I need to know the error produced, and what is the programming module that generated it. • If the outcome is negative, we have to know exactly why you are in error. For example, if the consistency check has failed, I need to know at what point it occurred.
  • 8. The control requirements • We need to know the final outcome of the control about the loading day and the reference day. • To get the final outcome of the controls, we have to think about implementing a control logic similar to that shown in the next figure. • Dark green definitely the correct situations. In red, the alert situations. In light green, the ones presumably correct but that require attention.
  • 9. The control requirements 1 – OK (arrived and right day) Expected day = reference day ? It had to arrive ? Data file is arrived ? 2 - NOT OK ( arrived but wrong day) 3 - OK (unespected file) 4 - NOT OK (unespected file and wrong day) 5 - OK (maybe file) 6 - NOT OK (maybe file and wrong day) 7 - NOT OK (missing file) 8 – OK (no file to load) 9 - OK (maybe file) Expected day = reference day ? Expected day = reference day ? It had to arrive ? yes no maybe yes no maybe yes yes yes yes no no no no
  • 10. The control requirements • We must have via e-mail the result of processing. • Using the Micro ETL Foundation we can handle this situation and its control in a few steps. MEF: Open the link: https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k Go to the Mef_v2 folder and follow the instructions of the readme file. The data file is in the folder .. dat and is called mtf_export_20160314.csv. The control file with the expected number of rows is called mtf_export_20160314.row. It is present in the .. dat The file that configures the data file fields is located in the .. cft and is called mtf.csv
  • 11. The configuration of the data and control file • The first step is to insert into a configuration table, which we will call IO_CFT for brevity, all the information that we know about the features of the data file that we load. Also, for this case, you need to enter in the IO_CFT table also information relating to the control file. • The second step is to insert in the IO_CFT table, the information relative to the expected day of arrival of the data file. We must define a code, let's call FR_COD (File Reference Code) behind which there will be the load logic of a second configuration table that we will call IODAY_CFT. The FR_COD code represents the arrival frequency. For the moment, I have defined some commonly used values : • AD = Every day. It means that the data file must arrive every day. So, in IODAY_CFT table, they will be setted all the days. • AWD = All working days. It means that the data file must only arrive on the working days. So all holidays most Saturdays and Sundays will be null. • ? = I do not know when it comes, it is variable. Typical of monthly flows of which no one knows precisely when available. • Based on the FR_COD code, the IODAY_CFT table will be loaded, by setting the presence of the expected day in the FR_YMD field.
  • 12. Reference day configuration • The third step is to insert in the IO_CFT table, information relating to the expected reference day. • The DR_COD code must indicate what should be the reference day for data in the data file. I remember that the reference day must be present or implied. The same logic has been applied to FR_COD field also applies to DR_COD field. It will serve to set the IODAY_CFT. For the moment I have defined some commonly used values: • 0 = the reference date coincides with the current day. • 1 = the reference date coincides with the day before, that is, the current -1 • 1W = indicates the first preceding business day. • The configuration tasks of the IODAY_CFT table occurs only once in the process of the data file configuration. After, you no longer need to change. • Note that the use of the codes is a way to quickly facilitate the setting of the IODAY_CFT table. Nobody blocks you, to manually modify the table or with ad-hoc SQL.
  • 13. Configuration of the correction factor • The OFF_COD code present in IO_CFT indicates the correction factor to be applied to the reference day indicated by the feeding system. The OFF_COD does not act in control, but will act as a corrector of the day at run-time. For the moment I have defined some commonly used codes: • 0 = the reference day coincides with the day indicated by the feeding system. • 1 = the reference day coincides with the day before, that is, the current -1 • 1W = the reference date coincides with the previous working day. • The FROM_DR_YMD and TO_DR_YMD fields have the same meaning of the FR_COD field, but allow you to identify a range of possible reference days. For the moment, only one code has been defined • PM = the previous month of the current calendar day. MEF: The data file is in the folder .. dat and is called mtf_export_20160314.csv. The control file with the expected number of rows is called mtf_export_20160314.row. It is present in the .. dat The file that configures the file data field structure is located in the .. cft and is called mtf.csv The configuration file of the data file is called io_mtf.txt and is under the folder .. cft. It has the following settings:
  • 14. The configuration file IO_COD: MTF (file identificator) IO_DEB: Multilateral Trading Facilities (file description) TYPE_COD: FIN (file type - input file) SEC_COD: ESM (feeding system: ESMA) FRQ_COD: D (frequency - Daily) FILE_LIKE_TXT: mtf_export% .csv (generic name of the file without day) FILE_EXT_TXT: mtf_export_20160314.csv (name of the sample data file) HOST_NC:., (Priority on the decimal point) HEAD_CNT: 1 (number of rows in header) FOO_CNT: 0 (number of rows in tail) SEP_TXT :, (separator symbol if csv) START_NUM: 12 (starting character of the day in the name) SIZE_NUM: 8 (size of day) RROW_NUM: 2 (row of the control file in which there is the file rows number) RSTART_NUM: 8 (where begins the number of rows) RSIZE_NUM: 6 (size of the number) MASK_TXT: YYYYMMDD (format of the day) FR_COD: AWD (file reference code) DR_COD: 1W (day reference code) OFF_COD: 1W (offset on day reference) RCF_LIKE_TXT: mtf_export% .row (generic name of control file without day) RCF_EXT_TXT: mtf_export_20160314.row (name of the sample control file) FTB_TXT: NEWLINE (indicator of the row end for the Oracle external table) TRUNC_COD: 1 (indicating whether the staging table should be truncated before loading) NOTE_IO_COD: MTF (presence of a notes file)
  • 15. The configuration file MEF: The DR_COD code is managed by the mef_sta_build.p_dr_cod function The FR_COD code is managed by the mef_sta_build.p_fr_cod function The OFF_COD code is managed by mef_sta.f_off_cod function. See further detail in Recipe 12 on Slideshare The functions that handle the day range are mef_sta_build.p_from_dr_cod and mef_sta_build.p_to_dr_cod. In this way, by changing the functions we can define other codes. The mef_sta_build.p_objday_cft will load the IODAY_CFT table. The complete configuration of the data file is done by launching the procedure SQL> @sta_conf_io MTF
  • 16. The data file loading • The process of loading of the data file, must insert in a log table the information related to the elaboration day and to the reference day received from the feeding system. MEF: SQL> exec mef_job.p_run('sta_esm_mtf'); • Comparing, at the end of loading, what is configured with what is loaded, we can infer a final outcome of the process. This comparison may be displayed by means of a view which we will call IODAY_CFV. • The logic with which works the view was summarized in a previous figure. On the basis of this outcome, it must be agreed upon an intervention strategy. • In our example, launched on a working day, we see that there is a problem related to the reference day. • Also there is another problem to be investigated: the number of rows declared in the control file is different from the number of rows loaded.
  • 17. Conclusion • Whatever way we implement an ETL solution, the important point to emphasize is that we need to know before, the time characteristics of the data file that we will load. • For each calendar day, we must have clear what I expect to receive on that day and, for any given data file, what is the reference day that I expect to find inside. • There can be no doubt or ambiguity: is information that we need to know in advance and we have to configure. After the loading of the Staging Area, only the comparison between what we expected to receive with what we actually received, will allow us to evaluate the correctness of the loaded data. • It ' just remember that this correctness check is a priority, is the first check, and it refers only to the two time components of the data. Only if these checks are positive, it will make sense to continue with the other quality controls.
  • 18. References On Slideshare: the series: Recipes of Data Warehouse and Business Intelligence. Blog: http://microetlfoundation.blogspot.it http://massimocenci.blogspot.it/ Micro ETL Foundation free source at: https://drive.google.com/open?id=0B2dQ0EtjqAOTQzZSaUlyUmxpT1k Last version v2. Email: massimo_cenci@yahoo.it