From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep

Hands-on learning: Configure an Amazon Data Firehose stream

From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep

Hands-on learning: Configure an Amazon Data Firehose stream

- [Instructor] In this lesson, we'll configure an Amazon Data Firehose Stream. Note that Firehose is not on the free tier, so there's going to be a small charge of around 7 cents or so to process our test data. Now if you don't want to incur the cost, just watch me do the lab without doing it in your own account. Okay, let's go over to the console. And let's go to, let's search for Firehose, and open the Data Firehose Console. Okay, click on Create a Firehose Stream. For the source, choose Direct Put. We're going to put some test data ourselves. Notice that the other options here are Kinesis Data Stream, or the Managed Service for Apache Kafka. The destination, we're just going to put our test records at Amazon S3, but you can see we also have other options such as OpenSearch, Redshift, Datadog, Elastic, and HTTP Endpoint, Snowflake, and many others. So let's choose Amazon S3. Let's give our Firehose stream a name. Let's just call it "DEA." This next section allows us to configure transformations for our records. So if we wanted to have records that maybe aren't in the format that we want them to be for analysis, you can have them automatically convert it. For example, you can configure a Lambda function to do any kind of transformation you want, or you could just have JSON data automatically transformed into (indistinct) or ORC. For the destination settings, let's create a new bucket to store our test data. So let's click on the Create button. And now that's going to open up the S3 console so that we can create a bucket, and I'm just going to call mine "deastreamtest," and put dash and then put your initials or some random characters to make it unique. We can leave all the other settings on their default and click Create Bucket. Great. Now let's go back over to the tab where we have Firehose open. And this may take a few minutes to update, but let's go ahead and see if we can find our bucket by clicking Browse. Got to filter for DEA. So I've clicked the refresh button and now I can see it, "deastreamtest-wade." And there it is. Choose, and there's some more options we can configure, such as if we want a new line delimiter, if we want to change the default partitioning of the records in S3, if we want to put a prefix in our bucket, and also if we want to put a prefix for error records. Okay, let's go ahead and create the Firehose stream. Okay, this could take a few minutes, so I'm going to pause it and come back when it's done. Okay, now it says the status is active, so we can go ahead and put some test data into the stream and have it delivered to S3. So the console has a nice, easy way to deliver test data to the stream. So let's click on the arrow here. And you can see the console will automatically send data to our Firehose stream. In this case, it's going to send simulated data for stock transactions. So as stock price changes, this is going to be updated in the stream. So all we have to do now is start sending demo data. So this is sending data to the stream right now, and then the stream is buffering it and then sending it to our S3 bucket. So let's go to the S3 tab and open up our stream test bucket. Depending upon the buffering settings, this can take a few minutes. So I'm going to click on refresh a couple of times. And again, it's going to take a few minutes, so I'm going to pause it until I start seeing objects in the bucket. Okay, so now I'm seeing objects in my bucket. Now that took like five minutes or so, so don't worry if you don't see anything right away, just give it time and the objects will start appearing. Before we look at the objects, let me go ahead and stop the test data from going over there so we don't incur any more charges. And then we'll go back to the S3 console and take a look at the data. So now you can see what's happened here is Firehose has automatically created some partitions in our bucket for us. So this is today's date, the date that I'm actually running this lab. And there's a time here as well. We have our first data object. And to view this, we need to download it. And here's what the file looks like. You can see it's just stock ticker data, as I said. Okay, I'm going to go back to the console now and clean up this test data. First of all, we can delete the data from the bucket. Click on Delete and type, "permanently delete." I'm going to go back to the bucket and delete the bucket. You just have to type the name of the bucket and delete bucket. Okay, lastly, I'm going to delete the Firehose stream. So you just have to click on Firehose Streams and Delete. Once again, type the name of the stream. Okay, that's it for the lab. So we got the Firehose stream created, and we sent some test records to an S3 bucket using the stream.

Contents