AWS_Data_Pipeline

AWS Data Pipeline
~ Ahasan Habib
Technical Project Manager,
Ixora Solutions Ltd.
Dhaka, Bangladesh

What is AWS Data Pipeline?
● Webservice
● Movement & Data transformation
● Data driven workflow

Benefits
● Sequence, Schedule, Run, Manage recurring data processing workloads
reliably.
● Cost effective
● Easy to design ETL
● Support for both structure and unstructure data
● Support on premises and cloud

Data Pipeline Components
● Pipeline Definition
● Pipeline Schedules & run tasks
● Task Runner

Data Pipeline Objects
● ShellCommand Activity
● S3 Data Node
{
"id" : "CreateDirectory",
"type" : "ShellCommandActivity",
"command" : "mkdir new-directory"
}
{
"id" : "OutputData",
"type" : "S3DataNode",
"schedule" : { "ref" : "CopyPeriod" },
"filePath" :
"s3://myBucket/#{@scheduledStartTime}.csv"
}

● EC2 Resource
● Schedule {
"id" : "Hourly",
"type" : "Schedule",
"period" : "1 hours",
"startDateTime" : "2012-09-
01T00:00:00",
"endDateTime" : "2012-10-
01T00:00:00"
{
"id" : "MyEC2Resource",
"type" : "Ec2Resource",
"actionOnTaskFailure" : "terminate",
"actionOnResourceFailure" : "retryAll",
"maximumRetries" : "1",
"instanceType" : "m1.medium",
"securityGroups" : [
"test-group",
"default"
],
"keyPair" : "my-key-pair"
}

Work with Other AWS Services
● Amozon Dynamo DB
● Amaxon RDS
● Amazon Redshift
● Amazon S3
● EC2

Accessing Data Pipeline
● Amazon Management Console
● AWS CLI
● AWS SDK
● QUERY API

Create Data Pipeline
● Compose Pipeline Definition objects in a file
● Definition File Structure
{
"id": "S3DataInput",
"type": "S3DataNode",
"schedule": {"ref": "TheSchedule"},
"filePath": "s3://bucket_name",
"myCustomField": "This is a custom value in a custom field.",
"my_customFieldReference": {"ref":"AnotherPipelineComponent"}
}

Notification
● SNS
● Push Delivery
● Pub/sub Model

“There's a lot of difference between listening and
hearing.”
~G.K. Chesterton
THANK YOU

AWS_Data_Pipeline

More Related Content

What's hot (7)

Similar to AWS_Data_Pipeline (20)

AWS_Data_Pipeline