Technical Architecture - Chainsys dataZap

Objectives
ChainSys’ Smart Data Platform enables the business to achieve these critical needs.
1. Empower the organization to be data-driven
2. All your data management problems solved
3. World class innovation at an accessible price
Subash Chandar Elango
Chief Product Officer
ChainSys Corporation
Subash's expertise in the data management sphere
is unparalleled. As the creative & technical brain behind
ChainSys' products, no problem is too big for Subash,
and he has been part of hundreds of data projects
worldwide.

Introduction
This document describes the Technical Architecture of the
Chainsys Platform
Purpose
Scope
The purpose of this Technical Architecture is to define the technologies,
products, and techniques necessary to develop and support the system
and to ensure that the system components are compatible and comply
with the enterprise-wide standards and direction defined by the Agency.
The document's scope is to identify and explain the advantages and
risks inherent in this Technical Architecture.
This document is not intended to address the installation and configuration
details of the actual implementation. Installation and configuration details
are provided in technology guides produced during the project.
Audience
The intended audience for this document is Project Stakeholders,
technical architects, and deployment architects

Platform Component Definition
The system's overall architecture goals are to provide a highly
available, scalable, & flexible data management platform
A key Architectural goal is to leverage industry best practices
to design and develop a scalable, enterprise-wide J2EE
application and follow the industry-standard development
guidelines.
All aspects of Security must be developed and built within the
application and be based on Best Practices.
Security
User Management
Base Component
Gateway Component
Authentication /
Authorization /
Crypto
User / Groups
Roles /
Responsibility
Access Manager
Workflow
Versioning
Notification
Logging
Scheduler
Object Manager
API Gateway
Data Quality
Management
Master Data
Governance
Analytical MDM
(Customer 360,
Supplier360,
Product 360)
Data Migration
Setup Migration
Test data Prep
Big Data Ingestion
Data Archival
Data Reconciliation
Data Integration
Data Masking
Data Compliance
(PII, GDPR, CCPA,
OIOO)
Data Cataloging
Data Analytics
Data Visualizat
Used for Autonomous
Regression Testing
Used for Load and
Performance Testing
Rapid Application
Develiopment (RAD)
Framework
Visual Development
Approach
Drag & Drop
Design Tools
Functional Components
into Visual Workflow
Foundation Smart data platform Smart Business Platform
Architecture Goals

The Platform Foundation forms the base
on which the entire Platform is built.
The major components that create the
Platform are described in brief.
Security Management User Management
Base Components
Gateway Component
Users
User Groups
Object Access Manager
Responsibilities
Role Hierarchy
JWT
SAML
OAuth2.0
Federated Authentication
Platform Authentication
Credential Authentication
AD
LDAP
Authentication Service
Credential Authenticator
SSO Authenticator
Authorization Engine
Org/License Authorization
App / Node Authorization
Access Authorization
Hashing Algorithm
Asymmetric Encryption
Crypto Engine
MD5
SHA1
AES 128
AES 256
Platform API
API Gateway Engine
Login API
REST API Publisher SOAP Service Publisher
Job Feedback
Workflow
Logging
Constructs
Application Logs
Execution Logs
Audit Logs
Approvals
Activities
SLA
Collaborate Versioning
EMAIL SVN
GIT
Database
Web Notification
Chats
Platform Object Manager
Object Sharing
Scheduler
Job Schedular Job Feedback
Dependent Sharing
Sharing Manager
Platform Foundation

The component manages all the Roles,
Responsibilities, Hierarchy, Users, &
User Groups.
The Platform comes with
the preconfigured
Responsibilities for dataZap,
dataZen, and dataZense.
Organizations can customize
Responsibilities and are
assigned to the platform
objects with additional
privileges.
The Platform comes with
the predefined Roles for
dataZap, dataZen, and
dataZense. Organizations
can create their Roles.
The Role-based hierarchy is
also configured for the user
level hierarchy. The roles are
assigned with the default
responsibilities.
The users will be assigned
these applications that are
necessary for them. The
User will be given a Role.
The hierarchy formed using
the role hierarchy setup
where a manager from the
next role is assigned.
The responsibility against
these roles is set by default
for the users. The User can
be given more responsibilities
or revoked an existing
responsibility against a role.
Users gain access to the
objects based on the
privileges assigned for the
responsibility.
User Management
Responsibilities Users
Roles

SSL
Authentication Engine
The Platform is SSL / HTTPS enabled on the transport layer with TLS 1.2 support. The SSL is applied to
the nodes exposed to the users like the DMZ nodes and Web nodes and the nodes exposed to the
third-party applications like the API Gateway nodes.
The Platform offers a Credential based authentication handled by the Platform and also Single Sign-On
based federated authentication. Both SSO and Credential authentication can co-exist for an organization.
User authentication on the Platform happens with the supplied credentials. All the successful
sessions are logged, and failed attempts are tracked at the user level for locking the user account.
A user can have only one web session at any given point in time. Password policy, including expiry,
is configured at the Organization level, applicable for all users. Enforced password complexity like.
Credential Authentication
SSO can be set up with federated services like SAML, OAuth 2.0,
or JWT (Java Web Tokens). Setup for an IDP would be configured
against the organization profile, and authentication would happen
in the IDP. This can either be IDP initiated or SP
(Chainsys Smart Data Platform) initiated.
The organization users with SSO would get a different context
to login.
Single Sign-On
Min length
Max length
Usage of Numbers, Cases, and Special Characters can be set.
No of unsuccessful attempts are also configurable
Security
Management
The security management component
takes care of the following

The first level of authorization would be the Organization License. The Licensing engine would be
used to setup the organization for the authentications too
The Crypto Engine handles both asymmetric encryption and hashing algorithms
The workflow engine is created to manage the orchestration of the flow of activities.
The workflow engine is part of the platform foundation extended by the applications to add
application-specific activities.
Version Management
This component helps in handling the version of objects and records eligible for versioning.
The foundation has the API to version the objects and its records and can be extended by the
applications to add specific functionalities. Currently, the Platform supports SVN as default and
also supports database-level version management. Support for GIT is on the roadmap.
Notification Engine
The notification engine is the component that will do all the notifications to the User in the system.
The feature helps notify the users on the page when online in the application. The other notifications
like Mail notification and Chat Notification are also part of this component.
Logging Engine
All activity logs, both foundation, and application are handled to understand and help in the debugging.
AES 128 is the default encryption algorithm but also supports 256 bits
The keys are managed within the Platform at the organization level. The usage of keys maintained
at the application level determines how they are used for encryption and decryption.tv
All the internal passwords are being stored by default with MD5 hashing
Encryption of the stored data can be done at the Database layer as needed
The next level of authentication would be the Applications assigned to the Organization and the
respective User. The individual application nodes would be given to the organization as per the
service agreement to handle load balancing and high availability
Authorization of pages happens with the responsibilities assigned to the users
Authorization of a record happens concerning sharing the records to a group or individual users
Responsibility and sharing will have the respective privileges to the pages and records
On conflict, the Principle of least privilege is used to determine the access

The login service is the one that authenticates if the requested consumer has the proper authentication
or credentials to invoke the job or action. The publisher engine has two methods of authentication.
Login Service
The eligible jobs or actions can be published using the Simple Object Access Protocol (SOAP). SOAP is a
messaging protocol that allows programs that run on disparate operating systems to communicate
using Hypertext Transfer Protocol (HTTP) and its Extensible Markup Language (XML).
SOAP Service
The eligible jobs or actions can be published using the Representational State Transfer Protocol (REST).
REST communicates using the HTTP like SOAP and can have messages in multiple formats. In dataZap,
we will publish in the XML format or the JSON (JavaScript Object Notation) format.
REST Service
Inline authentication - where all the requests will have the Credential for authentication
and access control
Session Authentication - This service is explicitly invoked to get the token and gather the other
published services using this token to authorize the request.
It enables you to schedule a job once or regularly. In terms of recurring jobs are planned minutely,
hourly, weekly, and monthly.
Scheduler Creation
The scheduler execution engine uses the configuration and fires the job in the respective application.
The next job would be scheduled at the end of each job as per the configuration.
Scheduler Execution
The scheduled jobs are monitored and keep track of the progress and status at any stage. If there is
any delay in the expected job or unexpected errors, the responsible users are notified accordingly
for actions.
Job Monitoring
API
Gateway Engine
The API Gateway forms the foundation for publishing
and consuming services with the Platform.
All the eligible jobs or actions can be published for
external applications to access. The following are the
components that would form the publishing node.

dataZap
Component
The Execution Handler will be available on
the client-side and at the cloud to handle
the pure cloud environment and manipulate
data in the cloud for less Load at the client
end. The Execution controller will be available
in the cloud to direct the execution handler
in every step.
Execution Controller
End Points
Endpoints
Connector
Extract Adapter Dataflow Adapter
Platform
Endpoints
Foundation Engine
Load Adapter
Active Transformation
CDC Engine
Filter Engine
Child Iterator
Crypto Engine
Data Stream Service
Endpoint
Extract
Engine
Mapper
Foundation Engine
Lookup
Sequence
Expression
Lookup
Sequence
Expression
Pre-Load
Post Load
Data
Load
Engine
JCO
JDBC
{Rest}
{Soap}
OData
Ingestion Engine
Crypto Engine
Passive Transformation
Validation Engine
Reprocessing Engine
Reconciliation Engine
Relational Databases
Cloud Applications
Big Data Lake
No SQL Databases
Enterprise Storage
System
Message Broker
Enterprise Applications
Data
Object
Engine
Aggregator
Sorter
Unifer
Reporting Engine
Visualization API
Process Flow
Process Flow Adapter
Scheduler
Job Initiation
Exception Notification
Reconciliation Adapter
Comparator
Visualization API
API Gateway
REST API Publisher
Migration Flow
Master / Transaction Flow
Setup Migration Flow
Versioning Engine
Object Versioning
Lookup
Expression
BOTS
Playback
Builder
Objects
Joiner
Normalizer
Router
Execution Engine
Localized
Transformation
API TAPI
Compartor

Endpoint Connectors
The component has all the base connectors used to connect to most of the endpoint applications.
The base connectors would include
JDBC - For all the RDBMS connections
SAP JCo - For connecting to the SAP Systems
SOAP - Connects to applications enabled with SOAP APIs.
REST - Connects to applications enabled with REST APIs
OData - Connects to applications enables with OData APIs
FTP - To connect and extract data from files in the FTP sites.
NoSQL - To connect to databases with NoSQL like Mongo
Message Broker - To connect to different messaging
services like ActiveMQ, IBM MQ
The connections can be made secure based on the endpoint configuration by Secured Layer through
all of the above base connectors.
The specific connectors for Enterprise applications are wrappers built over these base connectors with
specific Security and governance applied as per the application needs. The diagram shows a few of the
existing wrapper endpoints created for the enterprise applications in the market.
We can use the base connectors for applications that do not have specific connectors if they do not have
any particular authentication methods other than the base level authentications provided. ChainSys
would build the applications specific connectors if not already exists.

Load Adapter
The component that handles the loading of
data into multiple systems or endpoints.
Ingestion Engine
Crypto Engine
Initial Data marting happens in this Engine and then manipulate the data.
The Crypto Engine enables us to encrypt the data during the data marting process to ensure Data is
protected in all formats. It also has the decryption to be applied before loading the data to the final
target endpoint.
Reconciliation Engine
The reconciliation engine handles the technical reconciliation of the data in two stages.
The reconciliation is done at the end of the pre-validation stage to determine the differences
between the raw data and the transformed data and further after the loading complete to decide
the differences between the loaded and the raw or changed data.
Reprocessing Engine
This component helps to correct the errored data both at the pre-validation level and the post-load level.
Data fix can be handled both online and offline. Users will be able to download the error data as an
excel and upload the corrected Data as a bulk update process. In addition to the data error correction,
we can also enhance or construct data that will pass through the validation step for quality.
Loading Engine
The loading engine is where the application understands the endpoint type and uses one of the loading
engines to load the data into the target application. The loading engine also has special adapters to use
the Playback Adapters of the Smart BOTS and Smart App Builder in the business platform.
Transformation
Passive transformation
This transformation just changes the values of the columns from one form to another.
The different transformations types like lookup transformation, sequence transformation, and
expression-based transformation can be performed.
TAPI
TAPI helps create reusable transformations (API) in multiple objects to make changes in one place
rather than in numerous areas.

Extract Adapter
The extract adapter component retrieves data
from multiple different types of endpoints and
processes the data to give the data in the
expected format.
This Engine handles almost all different kinds of systems and formats to retrieve data. It can work with
SQL / Flat files / SOAP and REST service.
Data Object Engine
This is to reduce the number of rows from the raw data extracted from the source.
Filters
This component handles the master child relationship between the data extracts so that the filter
applied on the master can get down to all the child levels.
Child Iterator
The Crypto Engine is to read or extract the data with encryption applied over the selected fields for
extraction. This helps in encrypting the data from being accessed from the front end or the back end.
Crypto Engine
Here the data extracted from the data object or the extract adapter are streamed to the applications
calling the service to pull the data from the endpoint.
Data Streaming Service
This is the component that gets the changed data from the source. Two modes can achieve this
Changed Data Capture
The recommended option is by assigning the date field to be used for bringing the changed data.
There is also an option to bring in data by comparing the records and is supposed to be resource-
intensive and is not recommended until there is no date field to compare.

This transformation where the number of rows is getting affected. The possible active transformations
available are the Normalizer, Joiner, Router, Unifier, Aggregator, and Sorter. The active transformation
engines convert the data structures from the source to the target.
Active transformation
It also has the rules engine (Router) to move the data as per the rules to different endpoints.
It also can compare the data between the two systems and determine action before moving the data.
Dataflow Adapter
Migration Flow
The dataflow adapter helps in transforming
and mapping the data from multiple sources
to multiple target systems.
This component overrides the workflow component in
the foundation. The migration flow engine is specific to the
migrating Master or Transaction Data. The feature has
orchestration capabilities and human intervention capabilities
like Approval, User Confirmation, and Receive Input.
Process flow Engine
This component overrides the workflow component in the
foundation. The process flow engine is specific to the data
movement. The feature has all the orchestration capabilities
and human intervention capabilities like Approval,
User Confirmation, and Receive Input.
Scheduler
These components provide the job's execution agents specific
to dataZap that needs to be executed by the base scheduling
engine. These are wrappers for the data movement components
like Load Adapters, Extract Adapters, Dataflow Adapters, and
Process Flow.
API Gateway
These are execution agents for the publisher in the foundation.
These form the wrapper to the jobs that need to be executed in
the data movement components like Load Adapters,
Extract Adapters, Dataflow Adapters, and Process Flow.

API Gateway
These are execution agents for the publisher in the foundation.
These form the wrapper to the jobs that need to be executed
in the data movement components like Load Adapters,
Extract Adapters, Dataflow Adapters, and Process Flow.
Reconciliation Adapter
The reconciliation adapter generates the query to compare
the data and produce the Visualization API result to create
the necessary reconciliation dashboard.
Reporting Engine
The reporting engine generates reports on the various
adapters' execution and produces dashboards to understand
the actions taken and to be taken.

11
dataZap
Component
The Execution Handler will be available on
the client-side and at the cloud to handle
the pure cloud environment and manipulate
data in the cloud for less Load at the client
end. The Execution controller will be available
in the cloud to direct the execution handler
in every step.
System Technology
Landscape
DMZ Nodes DMZ Nodes DMZ Nodes DMZ Nodes
APACHE HTTPD Server
Web Load Balancing
Reverse Proxy
Forward Proxy
single sign
On
Foundation Nodes
Default Data Stores
Caching Node
Schedular
Node
File / Log
Server
DATABASE
Metadata Store Versioning Store
Web Application
Apache
Tomcat 9
11
ACTIVE
Apache
MQ
Collaborate
Server
API
Gateway
dimple.js
R Analytics
12.16
V4
Selenium
WebDriver
Data Mart
Indexing Store
App Data Store
DATABASE
Apache
relax

Apache HTTPD
The Apache HTTPD server is used to route the calls to the Web nodes. The server also handles the load
balancing for both the Web Server Nodes and the API gateway Nodes. The following features are used in
the Apache HTTPD
Single Sign-On
This Node is built on the Spring Boot application
with Tomcat as the Servlet container.
Organizations opting to have a single sign-on would
have a separate SSO node with a particular context.
The default context will take them to the
platform-based authentication.
Highly scalable
Forward / Reverse proxy with caching
Multiple load balancing mechanisms
Fault tolerance and Failover with automatic recovery
WebSocket support with caching
Fine-grained authentication and authorization access control
Loadable Dynamic Modules like ModSecurity for WAF etc.
TLS/SSL with SNI and OCSP stapling support
This layer consists of the nodes exposed to the
users for invoking the actions throughfrontend
or a third-party application asAPI’s. The nodes
available in this layer would be theWeb Server
to render the web pages, API Gateway for other
applications to interact with the application, and
the collaborate node for notifications.
Web Nodes
DMZ Nodes
These nodes are generally the only nodes
exposed to the external world outside the
enterprise network. The two nodes in this
layer are the Apache HTTPD server and the
"Single Sign O" Node.

Apache Tomcat 9.x is used as the servlet container.
JDK 11 is the JRE used for the application. The Platform works on
OpenJDK / Azul Zulu / AWS Corretto and Oracle JDK.
Struts 1.3 platforms are used as the controllers
Integration between the webserver to the application nodes is handled with
Microservices based on the SpringBoot
The presentation layer uses HTML 5 / CSS 3 components and uses many
scripting frameworks like JQuery, d3js, etc.
The web server can be clustered to n- nodes as per the number of concurrent
users and requests.
This Node uses the service of Jetty to publish the API as SOAP or REST API.
The API Gateway can be clustered based on the number of concurrent API calls
from the external systems.
The Denial of Service (DoS) is accomplished in both JAX-WS and JAX-RS to prevent illegal attacks.
Web Server
The web application server hosts all the web pages of the chainsys platform.
Gateway Node
This Node uses all the default application services.
The notification engine uses netty APIs for sending notification from the Platform.
Apache Active MQ is used for messaging the notification from application nodes.
Collaborate
This Node is used to handle all different kinds of notifications to the users like front end notifications,
emails, push notifications (in the roadmap). This Node also has the chat services enabled that can be
used by the applications as needed

Node Node Node
(Analytical Services /
Catalog Services)
The application uses only
the default services that
are mentioned above.
The application uses only the
default services that are
mentioned above.
The application uses all the
default services that are
mentioned above.
In addition to this, it also uses
R analytics for Machine
Learning algorithms.
It also uses D3 and Dimple JS
for the visual layer.
Application Nodes
The application nodes are spring boot applications for
communicating between theother application nodes and
web servers.
Load Balancing is handled by the HAProxy based on the
number of nodes instantiated for each application.
JDK 11 is the JRE used for the application. The Platform
works on OpenJDK / AzulZulu / AWS Corretto and Oracle JDK.

The application uses all the default services
that are mentioned above. In addition to this,
it also uses the Selenium API for web-based
automation and Sikuli.
The application uses all the default services that are
mentioned above. These services are used to configure
the custom applications and to generate dynamic web
applications as configured.
The mobile applications' service would need
NodeJS 12.16, which would use the IonicFramework V4
to build the web and mobile apps for the configured
custom applications.

Data Storage Nodes
Database
Chainsys Platform supports both PostgreSQL 9.6 or higher and Oracle 11g or higher databases for both
The Platform uses PostgreSQL for the Metadata in the cloud. PostgreSQL is a highly scalable database.
Metadata of the setups and configurations of the applications
Data marting for the temporary storage of the data.
Scheduler Node
This Node uses only the default application node services.
This Node can be clustered only as failover nodes.
When the primary Node is down, the HAProxy makes the
secondary Node the primary Node
The secondary Node handles notifications, automatic
rescheduling of the jobs. It calls each of the application
objects that are schedulable to take all the possible
exception scenarios to be addressed.
Once the Node is up and running, this will become the
secondary Node.
Designed to scale vertically by running on more significant and faster servers
when you need more performance
Can be configured to do horizontal scaling, Postgres has useful streaming
replication features so you can create multiple replicas that can be used for
reading Data
It can be easily configured for High Availability based on the above.
1
2
3

Cache Server
Redis cache is used for caching the platform configuration objects and execution progress information.
This helps to avoid network latency across the database and thus increases the performance of
the application.
When the durability of Data is not needed, the in-memory nature of Redis allows it to perform well
compared to database systems that write every change to disk before considering a transaction
committed.
The component is set up as a distributed cache service to enable better performance during data access.
Redis cache can be made HA enabled clusters. Redis supports master-replica replication
Multi-tenant database architecture has been designed based on the following
Password Storage Encryption
Encryption For Specific Columns
Data Partition Encryption
Encrypting Passwords Across A Network
Encrypting Data Across A Network
SSL Host Authentication
Client-Side Encryption
Separate Databases approach for each tenant
Trusted Database connections for each tenant
Secure Database tables for each tenant
Easily extensible Custom columns
Scalability is handled on Single Tenant scaleout
PostgreSQL offers encryption
at several levels and provides
flexibility in protecting data
from disclosure due to database
server theft, unscrupulous
administrators, and insecure
networks. Encryption might
also be required to secure s
ensitive data.

Loader Adapters,
Data Objects,
Data Extracts,
Data Flows,
Process Flows,
Migrations Flows,
Reconciliations
Data Model,
Rules,
Augmentations,
Workflow
Data Set,
Views,
Dashboards,
Ad-hoc Reports
Object Model,
Layouts,
Workflow
File Log Server
This component is used for centralized logging, which handles the application logs, execution logs, and
error logs in the platform applications' common server. Log4J is used for distributed logging.
These logs can be downloaded for monitoring and auditing purposes. A small Http service gets executed,
which allows the users to download the file from this component—implemented with the Single Tenant
scaleout approach.
Subversion (SVN) Server
Apache Subversion (abbreviated as SVN) is a software versioning and revision control system distributed
as open-source under the Apache License. The Platform uses SVN to version all the metadata
configurations to revert in the same instance or move the configurations to multiple instances for
different milestones. All the applications in the Platform use the foundation APIs to version their objects
as needed.

Real-Time, Massive Read, and Write Scalability
Solr supports large-scale, distributed indexing, search, and aggregation/statistics operations, enabling it to
handle large and small applications. Solr also supports real- time updates and can take millions of writes
per second.
SQL and Streaming Expressions/Aggregations
Streaming expressions and aggregations provide the basis for running traditional data warehouse workloads
on a search engine with the added enhancement of basing those workloads on much more complex
matching and ranking criteria.
Security Out of the Box
With Solr, Security is built-in, integrating with systems like Kerberos, SSL, and LDAP to secure the design
and the content inside of it.
Fully distributed sharding model
Solr moved from a master-replica model to a fully distributed sharding model in Solr 4 to focus on
consistency and accuracy of results over other distributed approaches.
Cross-Data Center Replication Support
Solr supports active-passive CDCR, enabling applications to synchronize indexing operations across
data centers located across regions without third-party systems.
Solr is highly Big Data enabled
Users can storeSolr’s data in HDFS. Solr integrates nicely with Hadoop’s authentication approaches, and
Solr leverages Zookeeperto simplify fault tolerance infrastructure
Documentation and Support
Solr has an extensive reference guide that covers the functional and operational aspects of Solr for
every version.
Solr and Machine Learning
Solr is actively adding capabilities to make LTR an out of the box functionality.
Scheduler Node
Apache SOLR
ChainSys Platform uses SOLR for the data cataloging
needs as an indexing and search engine.
Apache Solr was used over the others for the
following reasons.
Solr is an open-source enterprise-search platform.
Its major features include full-text search, hit highlighting,
faceted search, real-time indexing, dynamic clustering,
database integration, NoSQL features, and rich
document handling.

CouchDB throws the HTTP and REST as its primary means of communication out the window to talk
to the database directly from the client apps.
The Couch Replication Protocol lets your Data flow seamlessly between server clusters to mobile
phones and web browsers, enabling a compelling offline-first user-experience while maintaining high
performance and strong reliability.
Another unique feature of CouchDB is that it was designed from the bottom-up to
enable easy synchronization between different databases.
CouchDB has JSON as its data format.
Apache CouchDB
Chainsys Platform uses CouchDB for mobile applications
in the Application Builder module. PostgreSQL would be
the initial entry point for the Dynamic Web Applications.
The data in the PostgreSQLwill sync with CouchDB if
mobile applications are enabled. In contrast, the initial
ntry point for the Dynamic Mobile Applications would be
in the PouchDB. CouchDB syncs with the PouchDB in the
mobile devices, which then syncs with PostgreSQL.
The main feature for having CouchDB are

Deployment at
Customer
Distributed Mode
Chainsys Smart Data Platform is a highly
distributed application and with a highly
scalable environment. Most of the nodes are
horizontally and vertically scalable.
DMZ Services VM
APACHE HTTPD Server single sign on
Load
Balancer
Web Container Cluster
Web Page Services Collaborate Services API Gateway
Node1 Node n Node1 Node1 Node n
Foundation Services Cluster
Caching Node Node1 Primary Node Secondary Node
Foundation Services Cluster File/Log Services Scheduling Services
Smart Data Platform Cluster
Web Page Services
Node1 Node n
Web Page Services
Node1 Node n
Web Page Services
Web Page Services
Node1 Node n Node1 Node n
Layout Build
Design & Process Layout Rendering
Node1 Node n
Node1 Node n Node1 Node n
Node1
Database Layer
Versioning VM
Database Culster
DATABASE
Metadata
Datamart
Secondary Node
Primary Node
Metadata
Datamart
SOLR Cluster
CouchDB Cluster
Core 1
Core 2
Stave node
Master Node
Core 1
Core 2
Apache
relax
Node 1
Doc 2
Doc 2
Node 1
Doc 2
Doc 2

DMZ Nodes
Apache HTTPD would be needed in a distributed environment as a load balancer. This would also be
used as a reverse proxy for access outside the network. This would be a mandatory node to be available.
SSO Node would be needed only if there is a need for the Single-Sign-On capability with any federated
services.
Web Cluster
Chainsys recommends having a minimum of two web node clusters to handle high availability and
Load balanced for better performance. This is a mandatory node to be deployed for the
Chainsys Platform.
The number of nodes is not restricted to two and can be scaled as per the application
pages’ concurrent usage.
The Collaborate node generally is a single node, but the Node can be configured for
High Availability if needed.
Gateway Cluster
The API Gateway Nodes are not mandatory to be deployed. It would be required only when there is a
need to expose the application APIs outside the Platform.
When deployed, Chainsys would recommend having a two-node cluster to handle high availability and
load balancing in high API call expectations.
The number of nodes in the clustered can be determined based on API calls’ volume and is not
restricted to two.
Application Cluster
The HAProxy or Apache HTTPD acts as the load balancer. All the calls within the application nodes are
handled based on the node configuration. If the Apache HTTPD is used in the DMZ for Reverse Proxy,
it is recommended to have HAProxy for internal routing or a separate Apache HTTPD.
The number of nodes in the cluster is not restricted to two. Individual application nodes can be scaled
horizontally for load balancing as per the processing and mission-critical needs.
Integration Cluster is a mandatory node that will be deployed in the Platform. All the other applications
depend on this application for all the integration needs.
Visualization Cluster is also a mandatory node that will be deployed in the Platform. All the other
applications depend on this application for all the dashboard report needs.

Data Storage Nodes
Generally, the PostgreSQL database would be configured for High Availability as an Active - Passive
instance. Depending on the number of read-write operations, it can be Load balanced too. This can be
replaced by Oracle 11g or more significant if the client wants to use the existing database license.
File Server would be needed only if there is no NAS or SAN availability to mount the same disk space
into the clusters to handle the distributed logging. The NFS operations for distributed logging would
require this Node.
SVN server would be mandatory to store all the configuration objects in the repository for porting from
one instance to the other. Generally, it would be a single node as the operation on this would not be
too high.
REDIS is used as a cache engine. It is mandatory for distributed deployment. This can be configured for
high availability using the master-slave replication.
SOLR would be needed only if data cataloging is implemented, and search capability is enabled.
This can be configured for High Availability. SOLR sharding can be done when the Data is too large for
one Node or distributed to increase performance/throughput.
CouchDB would be needed only if dynamic mobile applications are to be generated. CouchDB can be
configured for high availability. For better performance, Chainsys recommends having individual
instances of CouchDB for each active application.
The visualization uses the R Studio Server for Machine Learning capabilities. It is needed only when
the Machine Learning algorithms are to be used.
When deploying the MDM, the ”Smart Application Builder” node would be needed for
the dynamic layout generation and augmentation. The vice versa doesn’t apply as
”Smart Application Builder” is not dependent on the MDM nodes.
NodeJS would be needed only when mobile applications are to be dynamically generated. The Apache
HTTPD server will handle load balancing.
The Scheduler cluster would be needed even if one of the applications use the scheduling capability.
The cluster would only be a High Availability (Failover) and not load balanced. The number of nodes is
restricted to two.

DMZ Services VM
APACHE HTTPD Server
single sign on Foundation Package
Application Services VM
Web Services
Apache
Tomcat 9 ACTIVE
Apache
MQ
Foundation Services
Smart Data Platform
Smart App Builder
Caching Service
™
™
™ ™
™ ™
Services
Smart BOTS
Services
Services
Analytics
Services
Design & Process
Services
Layout Build &
Render Services
Catalog
Services
Collaborate
services
Scheduling Services
File / log Server
Indexing VM
NoSQL VM
Versioning VM
Database VM
Metadata / Datamart
DATABASE
Deployment at
Customer
Single Node
Single Node does not mean that literally.
Here we would say that all application services produced by
the ChainSys Platform are deployed in a Single Node or Server.
The rest of the data storage nodes are separate servers
or nodes. This type of installation would generally be for a
patching environment where there are not too many operations.
These would also be recommended for non-mission critical
development activities where high availability and scalability
are not a determining factor.
Foundation Package

DMZ Nodes
Apache HTTPD would be needed only if a reverse proxy is required for access outside the network.
This is not a mandatory node for a Single Node installation.
SSO Node would be needed only if there is a need for the Single-Sign-On capability with any federated
services.
Application Server
There will be just one Apache Tomcat as the web application service and will not be configured for
high availability.
Collaborate service will have the Apache ActiveMQ and the spring integration service.
The API Gateway would be required only if the objects are to be published as a REST API or SOAP Service.
This service can be shut down if not needed.
The Integration Service, Visualization Service, and Scheduler Service would be mandatory services
running.
The rest of the applications would be running or shut down depending on the license and need.
Data Storage Nodes
PostgreSQL would be in a separate node. Chainsys does not recommend having the applications and the
Databases on the same machine.
SVN server would be mandatory to store all the configuration objects in the repository for porting from
one instance to the other.
SOLR would be needed only if data cataloging is implemented, and search capability is enabled.
CouchDB would be needed only if dynamic mobile applications are to be generated as a separate node.

Generally, the above instance propagation strategy is recommended. Depending on the applications in
use and the Load, it could be determined to go with a single node deployment or a distributed model
deployment. Generally, it is recommended to have a distributed deployment for Production instances.
The adapters are forward propagated using the SVN repository.
All the instances need not follow the same deployment model. For the reverse propagation of the example
from Production to Non-Production instances, we can clone the application and the data storage layer and
have the node configurations re-configured to the lower instances.
DEV Meta DB TST Meta DB PRD Meta DB
DEV TST/QA PRD
Deployment at
Customer
Instance Strategy
Built-in Configuration management
approaches for check-in and check-out
without leaving ChainSys Platform.
- Gives a great Software development
lifecycle process for your projects.
- All your work is protected in a secure
location and backed up regularly.

Prod Data Subnet - Tenant x Prod Application Subnet
Dev Data Subnet - Tenant x Dev Application Subnet
Prod DMZ Subnet
APACHE HTTPD Server
Dev DMZ Subnet
APACHE HTTPD Server
Private Cloud
Prod Data Subnet - Tenant n
Dev Data Subnet - Tenant 1
Prod Data Subnet - Tenant n
Dev Data Subnet - Tenant 1
Prod DMZ Subnet
APACHE HTTPD Server
Dev DMZ Subnet
APACHE HTTPD Server
Public Cloud
Virtual Network 2
Virtual Network 1
Prod Application Subnet
Dev Application Subnet
Gateway
Tenant x
Slite to slite
Tunnel
On-Premise Network -
Tenant x
Gateway
Tenant x
Slite to slite
Tunnel
Tenant x
Gateway
Tenant x
Slite to slite
Tunnel
Tenant x
Pure Cloud Deployment
ChainSys Platform is available on the cloud.
The Platform has been hosted as a
Public Cloud and also has the
Private Cloud options.

Public Cloud
The Site would handle connectivity to the Customer Data Center to Site tunneling between the
Tenants Data Center and the ChainSys Data Center. Individual Gateway Routers can be provisioned
per tenant.
Tenants will share the same application and DMZ node clusters except the data storage nodes.
If a tenant needs to be assigned a separate application node for the higher workloads, we can have
the particular application node-set only for that specific tenant.
As mentioned earlier in the Database section, Multi-Tenancy is handled at the database level.
Tenants will have separate database instances
The databases would be provisioned based on the license and the subscription.
Depending on the workload on the nodes, each Node can be clustered to balance the Load.
Private Cloud
Customers (Tenants) will have all applications,
DMZ nodes, and data storage nodes assigned to
the specific tenant and are not shared.
Depending on the workload on the nodes,
each Node can be clustered to balance the Load.
The application nodes and databases would be
provisioned based on the license and subscription.

This can be associated along with both the Private or Public cloud. An Agent would be deployed in the
client organization’s premises or data center to access the endpoints. This would avoid creating the
Site to Site tunnel between the Client Data Center and the ChainSys Cloud Data Center.
There is a proxy (Apache HTTPD Server) on both sides, the ChainSys Data Center and the Client Data Center.
All the back and forth communications between the ChainSys Data Center and the Agent are routed
through the proxy only. The ChainSys cloud sends instructions to the Agent to start a Task along with
the Task information. The Agent executes the Task and sends back the response to the cloud with the
Task’s status.
The Agents (for dataZap, dataZense, and Smart BOTS) would be deployed.
For dataZap, we can use the existing database (either PostgreSQL or Oracle) for the staging process.
The Agent executes all integration and migration tasks by connecting directly to the source and target
systems, validating and transforming data, and transferring data between them.
For dataZen and Smart App Builder, data would be streamed to the Chainsys Platform to manipulate
the data.
Data Center
APACHE
HTTPD
Server
DMZ Services VM
OutSide World
Datamart
End Points
DATABASE Executable
Analytics Executable
Catalog Executable
Executable
Web Nodes
Apache
Tomcat 9
Collaborate
Server
API Gateway
DMZ Nodes
single sign on
APACHE HTTPD Server
Application Clustvaer Nodes Data Store
Analytics Services
Catalog Services
Design & Process
Schedular
Node
File / Log
Server
Application Deployment Node Layout Build
Data Mart
DATABASE
Indexing Store
Caching Node
Metadata Store
DATABASE
Versioning Store
App Data Store
Apache
relax
Client Data Centre
Hybrid Cloud Deployment
Hybrid Cloud

Disaster Recovery
All the application nodes and the web nodes would be replicated using the RSYNC. The specific install
directory and any other log directories would be synced to the secondary replication nodes.
For PostgreSQL, the Streaming replication feature would be used, which used the archive log shipping.
SOLR comes up with the in-built CDCR (Cross Data Center Replication) feature, which can be used for
disaster recovery.
CouchDB has an outstanding replication architecture, which will replicate the primary database to the
secondary database.
The RPO can be set to as per the needs individually for both Applications and Databases
The RTO for the DR would be approximately an hour.
RSYNC
Streaming Replication
Archive Log Ship
CDC Replication
Replication
Primary Application & DB
Application Nodes
PostgreSQL Nodes
SOLR Nodes
CouchDB Nodes
Secondary Application & DB
Application Nodes
PostgreSQL Nodes
SOLR Nodes
CouchDB Nodes

Various data collection methods and protocols
Start to monitor all metrics instantly by using
out-of-the-box templates
Flexible trigger expressions and Trigger dependencies
Proactive network monitoring
Remote command execution
Flexible notifications
Integration with external applications using Zabbix API
Pure Cloud Deployment
ChainSys Platform is available on the cloud.
The Platform has been hosted as a
Public Cloud and also has the
Private Cloud options.
Application Monitoring
ChainSys uses third party monitoring open-source tools
uch as Zabbix and Jenkins to monitor all the nodes.
Zabbix supports tracking the Servers’ availability and
performance, Virtual Machines, Applications
(like Apache, Tomcat, ActiveMQ, and Java), and Databases
like PostgreSQL, Redis, etc.) that are used in the Platform.
Using Zabbix, the following are achieved
Third-Party Monitoring Tools
We can also use the individual
application monitoring systems
for more in-depth analysis but
having an integrated approach
to looking into the problems
helps us be proactive & faster.

Single Node
Single Node does not mean that literally.
Here we would say that all application services produced by
the ChainSys Platform are deployed in a Single Node or Server.
The rest of the data storage nodes are separate servers
or nodes. This type of installation would generally be for a
patching environment where there are not too many operations.
These would also be recommended for non-mission critical
development activities where high availability and scalability
are not a determining factor.
In-Built
Monitoring System
ChainSys is working on its Application Monitoring tool that
monitors the necessary parameters like the CPU / Memory.
This tool is also planned to help monitor individual threads
within the application. It is also intended to do most
maintenance activities like patching, cloning, and database
maintenance from one single toolset. This will be integrated
with Zabbix for monitoring and alerting systems.

Supported Endpoints ( Partial )
Oracle Sales Cloud, Oracle Marketing Cloud, Oracle Engagement Cloud,
Oracle CRM On Demand, SAP C/4HANA, SAP S/4HANA, SAP BW,
SAP Concur, SAP SuccessFactors, Salesforce, Microsoft Dynamics 365,
Workday, Infor Cloud, Procore, Planview Enterprise One
Windchill PTC, Orale Agile PLM, Oracle PLM Cloud, Teamcenter, SAP PLM,
SAP Hybris, SAP C/4HANA, Enovia, Proficy, Honeywell OptiVision,
Salesforce Sales, Salesforce Marketing, Salesforce CPQ, Salesforce Service,
Oracle Engagement Cloud, Oracle Sales Cloud, Oracle CPQ Cloud,
Oracle Service Cloud, Oracle Marketing Cloud, Microsoft Dynamics CRM
Oracle HCM Cloud, SAP SuccessFactors, Workday, ICON, SAP APO and IBP,
Oracle Taleo, Oracle Demantra, Oracle ASCP, Steelwedge
Oracle Primavera, Oracle Unifier, SAP PM, Procore, Ecosys,
Oracle EAM Cloud, Oracle Maintenance Cloud, JD Edwards EAM, IBM Maximo
OneDrive, Box, SharePoint, File Transfer Protocol (FTP), Oracle Webcenter,
Amazon S3
HIVE, Apache Impala, Apache Hbase, Snowflake, mongoDB, Elasticsearch,
SAP HANA, Hadoop, Teradata, Oracle Database, Redshift, BigQuery
mangoDB, Solr, CouchDB, Elasticsearch
PostgreSQL, Oracle Database, SAP HANA, SYBASE, DB2, SQL Server,
MySQL, memsql
IBM MQ, Active MQ
Java, .Net, Oracle PaaS, Force.com, IBM, ChainSys Platform
Oracle E-Business Suite, Oracle ERP Cloud, Oracle JD Edwards,
Oracle PeopleSoft, SAP S/4HANA, SAP ECC, IBM Maximo, Workday,
Microsoft Dynamics, Microsoft Dynamics GP, Microsoft Dynamics Nav,
Microsoft Dynamics Ax, Smart ERP, Infor, BaaN, Mapics, BPICS
Cloud
Applications
PLM, MES &
CRM
HCM & Supply
Chain Planning
Project Management
& EAM
Enterprise Storage
Systems
Big Data
No SQL Databases
Databases
Message Broker
Development
Platform
Enterprise
Applications

One Platform for your
Data Management needs
End to End
www.chainsys.com
Data Migration
Data Reconciliation
Data Integaration
Data Quality Management
Data Governance
Analytical MDM
Data Analytics
Data Catalog
Data Security & Compliance

Technical Architecture - Chainsys dataZap

More Related Content

Similar to Technical Architecture - Chainsys dataZap (20)

More from Chainsys SEO (12)

Recently uploaded (20)

Technical Architecture - Chainsys dataZap