SlideShare a Scribd company logo
1
Implementation of TPT
connection in
Informatica
Author: Yagya Dutt Sharma
Mentor: Deepan Chakravarthy Mahadevan
2
Introduction:
Teradata Parallel Transporter is one example of products working together within an
active data warehouse. This new-generation product simplifies the data loading process by
running the protocols used by each of the Teradata Load and Unload Utilities as modules or
operators: load, update, export and stream.
Unlike conventional utilities and products in which multiple data sources are usually processed
in a serial manner, Teradata Parallel Transporter can access multiple data sources in parallel.
This ability can lead to increased throughput. Teradata Parallel Transporter also allows different
specifications for different data sources and, if their data is UNION-compatible, merges them
together.
Teradata Parallel Transporter was designed for increased functionality and customer ease of use
for faster, easier and deeper integration. The capabilities include:
 Simplified data transfer between one Teradata Database and another; only one script is
required to export from the production-and-load test system.
 Ability to load dozens of files using a single script makes development and maintenance
of the data warehouse easier.
 Distribution of workloads across CPUs on the load server eliminates bottlenecks in the
data load process. Data flows through multiple instances of UPDATE OPERATOR and in-
memory data streams to update tables.
 Option is available to export data to in-memory data stream instead of landing data.
 The open database connectivity (ODBC) operator reads from the ODBC driver, which
could pull data from any database; for example, DB2 or Oracle.
 Multiple operators can scan directories for files to load and can combine the data in the
in-memory data stream with UNION ALL operation and stream operator loads.
 Script-building wizard is available to aid first-time users.
Scenario:
An Informatica mapping with a one to one mapping to load data from file to a stage table
(intermediate table) with fast load (loader) connection was taking six plus hours to load 7 million
records.
3
Reason:
The fast loader creates a BTEQ script in the background. The fast loader is fast but does
a serial processing which would be slower to process 7 million records. As our source is a flat
file, the UNIX space consumption will also be occupied till the load completes. Below table
showcases the performance for different connections.
Connection
No.Of
Rows
Informatica
throughput(Rows/Sec) Elapsed time
TPT 71023350 16871 1 hour18 mins
Fast Load 71023350 2720 6 hours25 mins
Relational 71023350 1438 13 hours 50 mins
Solution:
Implementation of TPT connection in these kinds of mapping would increase the
performance, as TPT connection does a parallel load to the tables.
4
Steps to follow:
I. Open workflow managerclick on connectionsRelational.
II. Below window will appear select Teradata PT connection.
5
III. Enter connection details for new connection:-
6
Usage:
In the desired session, use the TPT connection
a. Under connections  select Teradata Parallel Transporter.
b. Enter the TPT connection string which was newly created.
c. Enter the ODBC connection string.
Benefits:
This can reduce the execution time of the ETL flow and improve the performance of the
Informatica server.
Reference:
Self-learning via project work (Change related activity in the project, enhancement).

More Related Content

PDF
Working with informtiaca teradata parallel transporter
PPTX
ETL in the Cloud With Microsoft Azure
PDF
PostgreSQL HA
PDF
Making Apache Spark Better with Delta Lake
PDF
Data ingestion and distribution with apache NiFi
PDF
GlusterFS CTDB Integration
PDF
Building large scale transactional data lake using apache hudi
PPTX
Azure Data Factory ETL Patterns in the Cloud
Working with informtiaca teradata parallel transporter
ETL in the Cloud With Microsoft Azure
PostgreSQL HA
Making Apache Spark Better with Delta Lake
Data ingestion and distribution with apache NiFi
GlusterFS CTDB Integration
Building large scale transactional data lake using apache hudi
Azure Data Factory ETL Patterns in the Cloud

What's hot (20)

PPTX
Streaming Real-time Data to Azure Data Lake Storage Gen 2
PPT
Amazon Simpledb
PDF
Deep Dive: a technical insider's view of NetBackup 8.1 and NetBackup Appliances
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
PDF
AWS Certified Developer Associate - Notes
PDF
Asia Cloud Computing Association's Financial Services in the Cloud Report 202...
PDF
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
PDF
ETL Made Easy with Azure Data Factory and Azure Databricks
PDF
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
PPTX
Cloud Computing
PPTX
Kafka presentation
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
PDF
Cisco Data Center Orchestration Solution
PDF
What's New in Apache Hive
PPTX
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
PPTX
Kafka Tutorial - DevOps, Admin and Ops
PPT
Fullandparavirtualization.ppt
PDF
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
PDF
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Amazon Simpledb
Deep Dive: a technical insider's view of NetBackup 8.1 and NetBackup Appliances
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
File Format Benchmark - Avro, JSON, ORC and Parquet
AWS Certified Developer Associate - Notes
Asia Cloud Computing Association's Financial Services in the Cloud Report 202...
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
ETL Made Easy with Azure Data Factory and Azure Databricks
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
Cloud Computing
Kafka presentation
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Cisco Data Center Orchestration Solution
What's New in Apache Hive
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Kafka Tutorial - DevOps, Admin and Ops
Fullandparavirtualization.ppt
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
Ad

Similar to TPT connection Implementation in Informatica (20)

DOC
Datastage parallell jobs vs datastage server jobs
PPT
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
PDF
Download Complete Streaming Data Pipelines with Kafka (MEAP) Stefan Sprenger ...
PDF
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
PDF
A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...
PDF
Buy ebook Streaming Data Pipelines with Kafka (MEAP) Stefan Sprenger cheap price
PDF
Streaming Data Pipelines with Kafka (MEAP) Stefan Sprenger
PPT
Building the DW - ETL
PPTX
Dataintensive
ODP
Presto
PDF
Streaming Data Pipelines with Kafka (MEAP) Stefan Sprenger
PDF
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdf
PDF
Approaches for data_loading
PPTX
Unit 5 Email FTP Rks.pps.ppt_20240425_112130_0000.pptx
PDF
ETL VS ELT.pdf
PDF
Meet the Data Processing Workflow Challenges of Oil and Gas Exploration with ...
PDF
The Grouping of Files in Allocation of Job Using Server Scheduling In Load Ba...
PDF
J017367075
PDF
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Datastage parallell jobs vs datastage server jobs
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Download Complete Streaming Data Pipelines with Kafka (MEAP) Stefan Sprenger ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...
Buy ebook Streaming Data Pipelines with Kafka (MEAP) Stefan Sprenger cheap price
Streaming Data Pipelines with Kafka (MEAP) Stefan Sprenger
Building the DW - ETL
Dataintensive
Presto
Streaming Data Pipelines with Kafka (MEAP) Stefan Sprenger
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdf
Approaches for data_loading
Unit 5 Email FTP Rks.pps.ppt_20240425_112130_0000.pptx
ETL VS ELT.pdf
Meet the Data Processing Workflow Challenges of Oil and Gas Exploration with ...
The Grouping of Files in Allocation of Job Using Server Scheduling In Load Ba...
J017367075
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Ad

TPT connection Implementation in Informatica

  • 1. 1 Implementation of TPT connection in Informatica Author: Yagya Dutt Sharma Mentor: Deepan Chakravarthy Mahadevan
  • 2. 2 Introduction: Teradata Parallel Transporter is one example of products working together within an active data warehouse. This new-generation product simplifies the data loading process by running the protocols used by each of the Teradata Load and Unload Utilities as modules or operators: load, update, export and stream. Unlike conventional utilities and products in which multiple data sources are usually processed in a serial manner, Teradata Parallel Transporter can access multiple data sources in parallel. This ability can lead to increased throughput. Teradata Parallel Transporter also allows different specifications for different data sources and, if their data is UNION-compatible, merges them together. Teradata Parallel Transporter was designed for increased functionality and customer ease of use for faster, easier and deeper integration. The capabilities include:  Simplified data transfer between one Teradata Database and another; only one script is required to export from the production-and-load test system.  Ability to load dozens of files using a single script makes development and maintenance of the data warehouse easier.  Distribution of workloads across CPUs on the load server eliminates bottlenecks in the data load process. Data flows through multiple instances of UPDATE OPERATOR and in- memory data streams to update tables.  Option is available to export data to in-memory data stream instead of landing data.  The open database connectivity (ODBC) operator reads from the ODBC driver, which could pull data from any database; for example, DB2 or Oracle.  Multiple operators can scan directories for files to load and can combine the data in the in-memory data stream with UNION ALL operation and stream operator loads.  Script-building wizard is available to aid first-time users. Scenario: An Informatica mapping with a one to one mapping to load data from file to a stage table (intermediate table) with fast load (loader) connection was taking six plus hours to load 7 million records.
  • 3. 3 Reason: The fast loader creates a BTEQ script in the background. The fast loader is fast but does a serial processing which would be slower to process 7 million records. As our source is a flat file, the UNIX space consumption will also be occupied till the load completes. Below table showcases the performance for different connections. Connection No.Of Rows Informatica throughput(Rows/Sec) Elapsed time TPT 71023350 16871 1 hour18 mins Fast Load 71023350 2720 6 hours25 mins Relational 71023350 1438 13 hours 50 mins Solution: Implementation of TPT connection in these kinds of mapping would increase the performance, as TPT connection does a parallel load to the tables.
  • 4. 4 Steps to follow: I. Open workflow managerclick on connectionsRelational. II. Below window will appear select Teradata PT connection.
  • 5. 5 III. Enter connection details for new connection:-
  • 6. 6 Usage: In the desired session, use the TPT connection a. Under connections  select Teradata Parallel Transporter. b. Enter the TPT connection string which was newly created. c. Enter the ODBC connection string. Benefits: This can reduce the execution time of the ETL flow and improve the performance of the Informatica server. Reference: Self-learning via project work (Change related activity in the project, enhancement).