SlideShare a Scribd company logo
Cloud computing with Amazon Web Services, Part
      2: Amazon Simple Storage Service (S3)
      Reliable, flexible, and inexpensive storage and retrieval of your
      data

      Skill Level: Introductory


      Prabhakar Chaganti (prabhakar@ylastic.com)
      CTO
      Ylastic, LLC.



      19 Aug 2008


      In this series, learn about cloud computing using Amazon Web Services. Explore
      how the services provide a compelling alternative for architecting and building
      scalable, reliable applications. This article delves into the highly scalable and
      responsive services provided by Amazon Simple Storage Service (S3). Learn about
      tools for interacting with S3, and use code samples to experiment with a simple shell.


      Amazon Simple Storage Service
      Part 1 of this series introduced the building blocks of Amazon Web Services and
      explains how you can use this virtual infrastructure to build Web-scale systems.

      In this article, learn more about Amazon Simple Storage Service (S3). S3 is a highly
      scalable and fast Internet data-storage system that makes it simple to store and
      retrieve any amount of data, at any time, from anywhere in the world. You pay for
      the storage and bandwidth based on your actual usage of the service. There is no
      setup cost, minimum cost, or recurring overhead cost.

      Amazon provides the administration and maintenance of the storage infrastructure,
      leaving you free to focus on the core functions of your systems and applications. S3
      is an industrial-strength platform that is readily available for your data storage needs.
      It's great for:


Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                                Page 1 of 21
developerWorks®                                                                        ibm.com/developerWorks




               • Storing the data for your applications.
               • Personal or enterprise backups.
               • Quickly and cheaply distributing media and other bandwidth-guzzling
                 content to your customers.

     Valuable features of S3 include:

     Reliability
          It is designed to tolerate failures and repair the system very quickly with
          minimal or no downtime. Amazon provides a service level agreement (SLA) to
          maintain 99.99 percent availability.

     Simplicity
         S3 is built on simple concepts and provides great flexibility for developing your
         applications. You can build more complex storage schemes, if needed, by
         layering additional functions on top of S3 components.

     Scalability
          The design provides a high level of scalability and allows an easy ramp-up in
          service when a spike in demand hits your Web-scale applications.

     Inexpensive
          S3 rates are very competitive with other enterprise and personal data-storage
          solutions on the market.

     The three basic concepts underpinning the S3 framework are buckets, objects, and
     keys.

     Buckets

     Buckets are the fundamental building blocks. Each object that is stored in Amazon
     S3 is contained within a bucket. Think of a bucket as analogous to a folder, or a
     directory, on the file system. One of the key distinctions between a file folder and a
     bucket is that each bucket and its contents are addressable using a URL. For
     example, if you have a bucket named "prabhakar," then it can be addressed using
     the URL http://prabhakar.s3.amazonaws.com.

     Each S3 account can contain a maximum of 100 buckets. Buckets cannot be nested
     within each other, so you can't create a bucket within a bucket. You can affect the
     geographical location of your buckets by specifying a location constraint when you
     create them. This will automatically ensure that any objects that you store within that
     bucket will be stored in that geographical location. At this time, you can locate your
     buckets in either the United States or the European Union. If you do not specify a
     location when creating the bucket, the bucket and its contents will be stored in the


Amazon Simple Storage Service (S3)
Page 2 of 21                                          © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                                 developerWorks®



      location closest to the billing address for your account.

      Bucket names need to conform to the following S3 requirements:

                • The name must start with a number or a letter.
                • The name must be between 3 and 255 characters.
                • A valid name can contain only lowercase letters, numbers, periods,
                  underscores, and dashes.
                • Though names can have numbers and periods, they cannot be in the IP
                  address format. You cannot name a bucket 192.168.1.254.
                • The bucket namespace is shared among all buckets from all of the
                  accounts in S3. Your bucket name must be unique across the entire S3.
      Buckets that will contain objects to be served with addressable URLs must conform
      to the following additional S3 requirements:

                • The name of the bucket must not contain any underscores.
                • The name must be between 3 and 63 characters.
                • The name cannot end with a dash. For example, myfavorite-.bucket.com
                  is invalid.
                • There cannot be dashes next to periods in the name. my-.bucket.com is
                  invalid.
      You can use a domain naming convention for your buckets, such as
      media.yourdomain.com, and thus map your existing Web domains or subdomains to
      Amazon S3. The actual mapping will be done when you add DNS CNAME entries to
      point back to S3. The big advantage with this scheme is that you can use your own
      domain name in your URLs to download files. The CNAME mapping will be
      responsible for translating between the S3 address for your bucket. For example,
      http://media.yourdomain.com.s3.amazonaws.com becomes the more friendly URL
      http://media.yourdomain.com.

      Objects

      Objects contain the data that is stored within the buckets in S3. Think of an object as
      the file that you want to store. Each object that is stored is composed of two entities:
      data and metadata. The data is the actual thing that is being stored, such as a PDF
      file, Word document, a video file, and so on. The stored data also has associated
      metadata for describing the object. Some examples of metadata are the content type
      of the object being stored, the date the object was last modified, and any other
      metadata specific to you or your application. The metadata for an object is specified
      by the developer as key value pairs when the object is sent to S3 for storage.

Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                               Page 3 of 21
developerWorks®                                                                       ibm.com/developerWorks



     Unlike the limitation on the number of buckets, there are no restrictions on the
     number of objects. You can store an unlimited number of objects in your buckets,
     and each object can contain up to 5GB of data.

     The data in your publicly accessible S3 objects can be retrieved by HTTP, HTTPS,
     or BitTorrent. Distribution of large media files from your S3 account becomes very
     simple when using BitTorrent; Amazon will not only create the torrent for your object,
     it will also seed it!

     Keys

     Each object stored within an S3 bucket is identified using a unique key. This is
     similar in concept to the name of a file in a folder on your file system. The file name
     within a folder on your hard drive must be unique. Each object inside a bucket has
     exactly one key. The name of the bucket and the key are together used to provide
     the unique identification for each object that is stored in S3.

     Every object within S3 is addressable using a URL that combines the S3 service
     URL, bucket name, and unique key. If you store an object with the key
     my_favorite_video.mov inside the bucket named prabhakar, that object can be
     addressed using the URL http://prabhakar.s3.amazonaws.com/
     my_favorite_video.mov.

     Though the concepts are simple, as shown in Figure 1, buckets, objects, and keys
     together provide a lot of flexibility for building your data storage solutions. You can
     leverage these building blocks to simply store data on S3, or use their flexibility to
     layer and build more complex storage and applications on top of S3 to provide
     additional functions.

     Figure 1. Conceptual view of S3




Amazon Simple Storage Service (S3)
Page 4 of 21                                         © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                             developerWorks®




      Access logging
      Each S3 bucket can have access log records that contain details on each request for
      a contained object. The log records are turned off by default; you have to explicitly
      enable the logging for each Amazon S3 bucket that you want to track. An access log
      record contains a lot of detail about the request, including the request type, the
      resource requested, and the time and date that the request was processed.

      The logs are provided in the S3 server access log format but can be easily


Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                            Page 5 of 21
developerWorks®                                                                      ibm.com/developerWorks



     converted into Apache combined log format. They can then be easily parsed by any
     of the open source or commercial log analysis tools, such as Webalizer, to give you
     a human readable report and pretty graphs upon request. The reports can be very
     useful to gain insight into your customer base that's accessing the files. See
     Resources for tools you can use for easier visualization of the S3 log records.


     Security
     Each bucket and object created in S3 is private to the user account creating them.
     You have to explicitly grant permissions to other users and customers for them to be
     able to see the list of objects in your S3 buckets or to download the data contained
     within them. Amazon S3 provides the following security features to protect your
     buckets and the objects in them.

     Authentication
         Ensures that the request is being made by the user that owns the bucket or
         object. Each S3 request must include the Amazon Web Services access key
         that uniquely identifies the user.

     Authorization
         Ensures that the user trying to access the resource has the permissions or
         rights to the resource. Each S3 object has an access control list (ACL)
         associated with it that explicitly identifies the grants and permissions for that
         resource.
         You can grant access to all Amazon Web Services users or to a specific user
         identified by e-mail address, or you can grant anonymous access to any user.


     Integrity
          Each S3 request must be digitally signed by the requesting user with an
          Amazon Web Services secret key. On receipt of the request, S3 will check the
          signature to ensure that the request has not been tampered with in transit.

     Encryption
         You can access S3 through the HTTPS protocol to ensure that the data is
         transmitted through an encrypted connection.

     Nonrepudiation
         Each S3 request is time-stamped and serves as proof of the transaction.

     Each and every REST request made to S3 must go through the following standard
     steps that are essential to ensuring security:
               • The request and all needed parameters must be assembled into a string.
               • Your Amazon Web Services secret access key must be used to create a


Amazon Simple Storage Service (S3)
Page 6 of 21                                        © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                                developerWorks®



                    keyed-HMAC (Hash Message Authentication Code) signature hash of the
                    request string.
                • This calculated signature is itself added as a parameter on the request.
                • The request is then forwarded to Amazon S3.
                • Amazon S3 will check to see if the provided signature is a valid
                  keyed-HMAC hash of the request.
                • If the signature is valid, then (and only then) Amazon S3 will process the
                  request.



      Pricing
      The charges for S3 are calculated based on three criteria, which are different based
      on the geographic location of your buckets.

                • The total amount of storage space used, which includes the actual size of
                  your data content and the associated metadata. The units used by S3 for
                  determining the storage consumed are GB-Month. The number of bytes
                  of storage used by your account is computed every hour, and at the end
                  of the month it's converted into the storage used for the month. The table
                  below shows pricing for storage.
                    Location                          Cost
                    United States                     $0.15 per
                                                      GB-Month of
                                                      storage used
                    Europe                            $0.18 per
                                                      GB-Month of
                                                      storage used


                • The amount of data or bandwidth transferred to and from S3. This
                  includes all data that is uploaded and downloaded from S3. There is no
                  charge for data transferred between EC2 and S3 buckets that are located
                  in the United States. Data transferred between EC2 and European S3
                  buckets is charged at the standard data transfer rate as shown below.
                    Location                          Cost
                    United States                     $0.100 per GB -
                                                      all data transfer
                                                      in
                                                      $0.170 per GB -
                                                      first 10TB /
                                                      month data
                                                      transfer out


Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                                 Page 7 of 21
developerWorks®                                                                           ibm.com/developerWorks




                                            $0.130 per GB -
                                            next 40TB /
                                            month data
                                            transfer out
                                            $0.110 per GB -
                                            next 100TB /
                                            month data
                                            transfer out
                                            $0.100 per GB -
                                            data transfer
                                            out / month over
                                            150TB
                   Europe                   $0.100 per GB -
                                            all data transfer
                                            in
                                            $0.170 per GB -
                                            first 10TB /
                                            month data
                                            transfer out
                                            $0.130 per GB -
                                            next 40TB /
                                            month data
                                            transfer out
                                            $0.110 per GB -
                                            next 100TB /
                                            month data
                                            transfer out
                                            $0.100 per GB -
                                            data transfer
                                            out / month over
                                            150TB


               • The number of application programming interface (API) requests
                 performed. S3 charges fees per each request that is made using the
                 interface—for creating objects, listing buckets, listing objects, and so on.
                 There is no fee for deleting objects and buckets. The fees are once again
                 slightly different based on the geographic location of the bucket. The
                 following table shows pricing for API requests.
                   Location                 Cost
                   United States            $0.01 per 1,00
                                            0 PUT, POST,
                                            or LIST
                                            requests
                                            $0.01 per
                                            10,000 GET
                                            and all other
                                            requests
                                            No charge for
                                            delete requests
                   Europe                   $0.012 per


Amazon Simple Storage Service (S3)
Page 8 of 21                                             © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                                developerWorks®




                                                      1,000 PUT,
                                                      POST, or LIST
                                                      requests
                                                      $0.012 per
                                                      10,000 GET
                                                      and all other
                                                      requests
                                                      No charge for
                                                      delete requests


      Check Amazon S3 for the latest price information. You can also use the AWS
      Simple Monthly Calculator for calculating your monthly usage costs for S3 and the
      other Amazon Web Services.


      Getting started with Amazon Web Services and S3
      To start exploring S3, you will first need to sign up for an Amazon Web Services
      account. You will be assigned an Amazon Web Services account number and will
      get the security access keys along with the x.509 security certificate that will be
      required when you start using the various libraries and tools for communicating with
      S3.

      All communication with any of the Amazon Web Services is through either the SOAP
      interface or the query/REST interface. The request messages that are sent through
      either of these interfaces must be digitally signed by the sending user to ensure that
      the messages have not been tampered with in transit, and that they are really
      originating from the sending user. This is the most basic part of using the Amazon
      Web Services APIs. Each request must be digitally signed and the signature
      attached to the request.

      Each Amazon Web Services user account is associated with the following security
      credentials:
                • An access key ID that identifies you as the person making requests
                  through the query/REST interface.
                • A secret access key that is used to calculate the digital signature when
                  you make requests through the query interface.
                • Public and private x.509 certificates for signing requests and
                  authentication when using the SOAP interface.

      You can manage your keys and certificate, regenerate them, view account activity
      and usage reports, and modify your profile information from Web Services Account
      information.



Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                                 Page 9 of 21
developerWorks®                                                                       ibm.com/developerWorks



     After you successfully sign up for the Amazon Web Services account, you need to
     enable Amazon S3 service for your account using the following steps:


            1.    Log in to your Amazon Web Services account.

            2.    Navigate to the S3 home page.

            3.    Click on Sign Up For This Web Service on the right side of the page.

            4.    Provide the requested information and complete the sign-up process.

     Examples in this article use the query/REST interface to communicate with S3. You
     are going to need to obtain your access keys. You can access them from your Web
     Services Account information page by selecting View Access Key Identifiers. You
     are now set up to use Amazon Web Services, and have enabled S3 service for your
     account.


     Interacting with S3
     To learn about interacting with S3, you can use existing libraries available from
     Amazon or from third parties and independent developers. This article does not
     delve into the details of communication with S3, such as how to sign requests, how
     to build up the XML documents used for encapsulating the data, or the parameters
     sent to and received from S3. We'll let the libraries handle all of that for us, and use
     the higher-level interface they provide. You can review the S3 developer guide for
     more details.

     You'll use an open-source Java™ library named JetS3t to explore S3, and learn
     about its API by viewing small snippets of code. By the end of the article you'll collect
     and organize these snippets into something useful: a simple and handy S3 shell that
     you can use at any time to experiment and interact with S3.

     JetS3t

     JetS3t is an open source Java toolkit for interacting with S3. It is more than just a
     library. The distribution includes several very useful S3 related tools that can be
     used by typical S3 users as well as service providers who build applications on top
     of S3. JetS3t includes:

     Cockpit
         A GUI for managing the contents of an Amazon S3 account.

     Synchronize
         A command-line application for synchronizing directories on your computer with
         an Amazon S3 account.


Amazon Simple Storage Service (S3)
Page 10 of 21                                        © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                                 developerWorks®



      Gatekeeper
          A servlet that you can use to mediate access to Amazon S3 accounts.

      CockpitLite
          A lighter version of Cockpit that routes all its operations through a mediating
          gatekeeper service.

      Uploader
          A GUI that routes all its operations through a mediating gatekeeper service and
          can be used by service providers to provide access to their S3 accounts for
          customers.
                            Download the latest release of JetS3t.


      You can, of course, use one of these GUI applications for interacting with S3, but
      that won't be very helpful if you need to develop applications to interface with S3.
      You can download the complete source code for this article as a zipped archive,
      including a ready-to-go Netbeans project that you can import into your workspace.

      Connecting to S3

      JetS3t provides an abstract class named org.jets3t.service.S3Service that
      must be extended by classes that implement a specific interface, such as REST or
      SOAP. JetS3t provides two implementations you can use for connecting and
      interacting with S3:
                • org.jets3t.service.impl.rest.httpclient.RestS3Service
                  communicates with S3 through the REST interface.
                • org.jets3t.service.impl.soap.axis.SoapS3Service
                  communicates with S3 through the SOAP interface using Apache Axis
                  1.4.

      JetS3t uses a file named jets3t.properties to configure various parameters that are
      used while communicating with S3. The example in this article uses the default
      jets3t.properties that is shipped with the distribution. The JetS3t configuration guide
      has a detailed explanation of the parameters.

      In this article you'll use the RestS3Service to connect to S3. A new RestS3Service
      object can be created by providing your Amazon Web Services access keys in the
      form of an AWSCredentials object. Keep in mind that the code snippets in this
      article are for demonstrating the API. To run each snippet, you have to ensure that
      all the required class imports are present. Refer to the source in the download
      package for the right imports. Or, even simpler, you can import the provided
      Netbeans project into your workspace for easy access to all of the source code.

      Listing 1. Create a new RestS3Service


Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                              Page 11 of 21
developerWorks®                                                                        ibm.com/developerWorks




       String awsAccessKey = ”Your AWS access key”;
       String awsSecretKey = “Your AWS Secret key”;
       // use your AWS keys to create a credentials object
       AWSCredentials awsCredentials = new AWSCredentials(awsAccessKey,
       awsSecretKey);
       // create the service object with our AWS credentials
       S3Service s3Service = new RestS3Service(awsCredentials);


     Managing your buckets

     The concept of a bucket is encapsulated by the
     org.jets3t.service.model.S3Bucket, which extends the
     org.jets3t.service.model.BaseS3Object class. This class is the parent
     class for both buckets and objects in the JetS3t model. Each S3Bucket object
     provides a toString(), in addition to various accessor methods, that can be used
     to print the salient information for a bucket (name and geographical location of the
     bucket, date the bucket was created, owner’s name, and any metadata associated
     with the bucket).

     Listing 2. List buckets

       // list all buckets in the AWS account and print info for each bucket.
       S3Bucket[] buckets = s3Service.listAllBuckets();
       for (S3Bucket b : buckets) {
          System.out.println(b);
       }


     You can create a new bucket by providing a unique name for it. The namespace for
     buckets is shared by all the user accounts, so sometimes finding a unique name can
     be challenging. You can also specify where you want the bucket and the objects that
     it will contain to be physically located.

     Listing 3. Create buckets

       // create a US bucket and print its info
       S3Bucket bucket = s3Service.createBucket(bucketName);
       System.out.println("Created bucket - " + bucketName + " - " + bucket);

       // create a EU bucket and print its info
       S3Bucket bucket = s3Service.createBucket(bucketName,
       S3Bucket.LOCATION_EUROPE);
       System.out.println("Created bucket - " + bucketName + " - " + bucket);


     You have to delete all the objects contained in the bucket prior to deleting the bucket
     or an exception will be raised. The RestS3Service class you have been using is
     fine for dealing with single objects. When you start dealing with multiple objects, it
     makes more sense to use a multithreaded approach to speed things up. JetS3t
     provides the org.jets3t.service.multithread.S3ServiceSimpleMulti
     class just for this purpose. You can wrap the existing s3Service object using this


Amazon Simple Storage Service (S3)
Page 12 of 21                                         © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                                 developerWorks®



      class and take full advantage of those multiprocessors. It comes in handy when you
      need to clear a bucket by deleting all the objects it contains.

      Listing 4. Delete a bucket

       // get the bucket
       S3Bucket bucket = getBucketFromName(s3Service, “my bucket”);
       // delete a bucket – it must be empty first
       s3Service.deleteBucket(bucket);

       // create a multi threaded version of the RestService
       S3ServiceSimpleMulti s3ServiceMulti = new S3ServiceSimpleMulti(s3Service);

       // get all the objects from bucket
       S3Object[] objects = s3Service.listObjects(bucket);
       // clear the bucket by deleting all its objects
       s3ServiceMulti.deleteObjects(bucket, objects);


      Each bucket is associated with an ACL that determines the permissions or grants for
      the bucket and the level of access provided to other users. You can retrieve the ACL
      and print the grants that are provided by it.

      Listing 5. Retrieve ACL for bucket

       // get the bucket
       S3Bucket bucket = getBucketFromName(s3Service, “my bucket”);
       // get the ACL and print it
       AccessControlList acl = s3Service.getBucketAcl(bucket);
       System.out.println(acl);


      The default permissions on newly created buckets and objects make them private to
      the owner. You can modify this by changing the ACL for a bucket and granting a
      group of users permission to read, write, or have full control over the bucket.

      Listing 6. Make a bucket and its content public

       // get the bucket
       S3Bucket bucket = getBucketFromName(s3Service, “my bucket”);
       // get the ACL
       AccessControlList acl = s3Service.getBucketAcl(bucket);
       // give everyone read access
       acl.grantPermission(GroupGrantee.ALL_USERS, Permission.PERMISSION_READ);
       // save changes back to S3
       bucket.setAcl(acl);
       s3Service.putBucketAcl(bucket);


      You can easily enable logging for a bucket and retrieve the current logging status.
      After logging is enabled, detailed access logs for each file in that bucket are stored


Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                              Page 13 of 21
developerWorks®                                                                        ibm.com/developerWorks



     in S3. Your S3 account will be charged for the storage space that is consumed by
     the logs.

     Listing 7. Logging for S3 buckets

       // get the bucket
       S3Bucket bucket = getBucketFromName(s3Service, “my bucket”);
       // is logging enabled?
       S3BucketLoggingStatus loggingStatus =
       s3Service.getBucketLoggingStatus(bucketName);
       System.out.println(loggingStatus);
       // enable logging
       S3BucketLoggingStatus newLoggingStatus = new S3BucketLoggingStatus();
       // set a prefix for your log files
       newLoggingStatus.setLogfilePrefix(logFilePrefix);
       // set the target bucket name
       newLoggingStatus.setTargetBucketName(bucketName);
       // give the log_delivery group permissions to read and write from the bucket
       AccessControlList acl = s3Service.getBucketAcl(bucket);
       acl.grantPermission(GroupGrantee.LOG_DELIVERY, Permission.PERMISSION_WRITE);
       acl.grantPermission(GroupGrantee.LOG_DELIVERY,
       Permission.PERMISSION_READ_ACP);
       bucket.setAcl(acl);
       // save the changed ACL for the bucket to S3
       s3Service.putBucketAcl(bucket);
       // save the changes to the bucket logging
       s3Service.setBucketLoggingStatus(bucketName, newLoggingStatus, true);
       System.out.println("The bucket logging status is now enabled.");


     Managing your objects

     Each object contained in a bucket is represented by the
     org.jets3t.service.model.S3Object. Each S3Bucket object provides a
     toString() that can be used to print the important details for an object:
               • Name of the key
               • Name of the containing bucket
               • Date the object was last modified
               • Any metadata associated with the object
     It also provides methods for accessing the various properties of an object along with
     its metadata.

     Listing 8. List objects

       // list objects in a bucket.
       S3Object[] objects = s3Service.listObjects(bucket);
       // print out the object details



Amazon Simple Storage Service (S3)
Page 14 of 21                                         © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                                   developerWorks®




       if (objects.length == 0) {
          System.out.println("No objects found");
       } else {
          for (S3Object o : objects) {
             System.out.println(o);
          }
       }


      You can filter the list of objects that are retrieved by providing a prefix to match.

      Listing 9. Filter the list of objects

       // list objects matching a prefix.
       S3Object[] filteredObjects = s3Service.listObjects(bucket, “myprefix”, null);
       // print out the object details
       if (filteredObjects.length == 0) {
          System.out.println("No objects found");
       } else {
          for (S3Object o : filteredObjects) {
             System.out.println(o);
          }
       }


      Each object can have associated metadata, such as the content type, date modified,
      and so on. You can also associate your application-specific custom metadata with
      an object.

      Listing 10. Retrieve object metadata

       // get the bucket
       S3Bucket bucket = getBucketFromName(s3Service, bucketName);
       // getobjects matching a prefix
       S3Object[] filteredObjects = s3Service.listObjects(bucket, “myprefix”, null);
       if (filteredObjects.length == 0) {
          System.out.println("No matching objects found");
       }else {
           // get the metadata for multiple objects.
           S3Object[] objectsWithHeadDetails = s3ServiceMulti.getObjectsHeads(bucket,
              filteredObjects);
           // print out the metadata
           for (S3Object o : objectsWithHeadDetails) {
              System.out.println(o);
           }
       }


      Each newly created object is private by default. You can use JetS3t to generate a
      signed URL that anyone can use for downloading the object data. This URL can be
      created to be valid only for a certain duration, at the end of which it automatically
      expires. The object is still private, but you can give the URL to anyone to let them
      download it for a brief time.

      Listing 11. Generate a signed URL for object downloads


Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                                  Page 15 of 21
developerWorks®                                                                        ibm.com/developerWorks




       // get the bucket
       S3Bucket bucket = getBucketFromName(s3Service, bucketName);
       // how long should this URL be valid?
       int duration = Integer.parseInt(tokens.nextToken());
       Calendar cal = Calendar.getInstance();
       cal.add(Calendar.MINUTE, duration);
       Date expiryDate = cal.getTime();
       // create the signed url
       String url = S3Service.createSignedGetUrl(bucketName, objectKey,
               awsCredentials, expiryDate);
       System.out.println("You can use this public URL to access this file for the
       next "
          + duration + " min - " + url);


     S3 allows a maximum of 5GB per object in a bucket. If you have objects that are
     larger than this, you'll need to split them up into multiple files, each 5GB in size, and
     then upload all of the parts to S3.

     Listing 12. Upload to S3

       // get the bucket
       S3Bucket bucket = getBucketFromName(s3Service, bucketName);
       // create an object with the file data
       File fileData = new File(“/my_file_to_upload”);
       S3Object fileObject = new S3Object(bucket, fileData);
       // put the data on S3
       s3Service.putObject(bucket, fileObject);
       System.out.println("Successfully uploaded object - " + fileObject);


     JetS3t provides a DownloadPackage class that makes it simple to associate the
     data from an S3 object to a local file and automatically save the data to it. You can
     use this feature to easily download objects from S3.

     Listing 13. Download from S3

       // get the bucket
       S3Bucket bucket = getBucketFromName(s3Service, bucketName);
       // get the object
       S3Object fileObject = s3Service.getObject(bucket, fileName);
       // associate a file with the object data
       DownloadPackage[] downloadPackages = new DownloadPackage[1];
       downloadPackages[0] = new DownloadPackage(fileObject,
                                   new File(fileObject.getKey()));
       // download objects to the associated files
       s3ServiceMulti.downloadObjects(bucket, downloadPackages);
       System.out.println("Successfully retrieved object to current directory");


     This section covered some of the basic functions provided by the JetS3t toolkit, and
     how to use them to interact with S3. See Resources for more about S3 service and
     an in-depth discussion of the JetS3t toolkit.


Amazon Simple Storage Service (S3)
Page 16 of 21                                         © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                              developerWorks®



      S3 Shell

      The interaction thus far with S3, through small code snippets, can be put into a more
      useful and longer lasting form by creating a simple S3 Shell program that you can
      run from the command line. You'll create a simple Java program that accepts the
      Amazon Web Services access key and secret key as parameters and returns a
      console prompt. You can then type a letter or a few letters, such as b for listing
      buckets or om for listing objects that match a certain prefix. Use this program for
      experimentation.

      The shell program contains a main() that is filled out with an implementation using
      the snippets of code you're using in this article. In the interest of space, the code
      listing for S3 Shell is not included here. The complete S3 Shell source code, along
      with its dependencies, is in the download. You can run the shell by simply executing
      the devworks-s3.jar file.

      Listing 14. Running the S3 Shell

       java -jar devworks-s3.jar my_aws_access_key my_aws_secret_key


      You can type h at any time in the S3 Shell to get a list of supported commands.

      Figure 2. Help in the S3 Shell




      Some of the more useful methods have been added to the S3 Shell. You can extend
      it to add any other functions you want to make the shell even more useful to your


Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                            Page 17 of 21
developerWorks®                                                                      ibm.com/developerWorks



     specific case.


     Summary
     In this article you learned some of the basic concepts behind Amazon's S3 service.
     The JetS3t toolkit is an open source library you can use to interact with S3. You also
     learned how to create a simple S3 Shell using sample snippets of code, so you can
     continue to experiment easily and simply with S3 using the command line.

     Stay tuned for the next article in this series, which will explain how to use Amazon
     Elastic Compute Cloud (EC2) to run virtual servers in the cloud.




Amazon Simple Storage Service (S3)
Page 18 of 21                                       © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                                        developerWorks®




      Downloads
       Description                                         Name           Size      Download method
      Sample code for this article                         devworks-s3.zip 2.93MB   HTTP

       Information about download methods




Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                                     Page 19 of 21
developerWorks®                                                                     ibm.com/developerWorks




     Resources
     Learn
        • Learn about specific Amazon Web Services:
               • Amazon Simple Storage Service (S3)
               • Amazon Elastic Compute Cloud (EC2)
               • Amazon Simple Queue Service (SQS)
               • Amazon SimpleDB (SDB)
               • The Service Health Dashboard is updated by the Amazon team regarding
                 any issues with the services.

        • Sign up for an Amazon Web Services account.
        • The Amazon Web Services Developer Connection is the gateway to all the
          developer resources.
        • Read the blog to find out the latest happenings in the world of Amazon Web
          Services.
        • From the Web Services Account information page you can manage your keys
          and certificate, regenerate them, view account activity and usage reports, and
          modify your profile information.
        • S3 Technical Resources has Amazon Web Services technical documentation,
          user guides, and other articles of interest.
        • Amazon S3 has the latest pricing information. Use the AWS Simple Monthly
          Calculator tool for calculating your monthly usage costs for S3 and the other
          Amazon Web Services.
        • Review the S3 Developer Guide for more details.
        • Amazon Service Level Agreement (SLA) for S3.
        • The S3stats resource page has several links on processing and viewing S3 log
          records. Logs are in the S3 Server Access Log Format, but can be easily
          converted into Apache Combined Log Format, then easily parsed by any of the
          open source or commercial log analysis tools such as Webalizer.
        • Learn about JetS3t, an open source Java toolkit for Amazon S3, developed by
          James Murty. See the toolkit documentation, and get detailed explanations of
          parameters in the configuration guide.
        • In the Architecture area on developerWorks, get the resources you need to
          advance your skills in the architecture arena.


Amazon Simple Storage Service (S3)
Page 20 of 21                                      © Copyright IBM Corporation 1994, 2008. All rights reserved.
ibm.com/developerWorks                                                                developerWorks®



         • Browse the technology bookstore for books on these and other technical topics.
      Get products and technologies
         • Download JetS3t and other tools.
         • Download IBM product evaluation versions and get your hands on application
           development tools and middleware products from IBM® DB2®, Lotus®,
           Rational®, Tivoli®, and WebSphere®.
      Discuss
         • Check out developerWorks blogs and get involved in the developerWorks
           community.



      About the author
      Prabhakar Chaganti
      Prabhakar Chaganti is the CTO of Ylastic, a start-up that is building a single unified
      interface to architect, manage, and monitor a user's entire AWS Cloud computing
      environment: EC2, S3, SQS and SimpleDB. He is the author of two recent books,
      Xen Virtualization and GWT Java AJAX Programming. He is also the winner of the
      community choice award for the most innovative virtual appliance in the VMware
      Global Virtual Appliance Challenge.




      Trademarks
      IBM, the IBM logo, ibm.com, DB2, developerWorks, Lotus, Rational, Tivoli, and
      WebSphere are trademarks or registered trademarks of International Business
      Machines Corporation in the United States, other countries, or both. These and other
      IBM trademarked terms are marked on their first occurrence in this information with
      the appropriate symbol (® or ™), indicating US registered or common law
      trademarks owned by IBM at the time this information was published. Such
      trademarks may also be registered or common law trademarks in other countries. A
      current list of IBM trademarks is available on the Web at
      http://www.ibm.com/legal/copytrade.shtml
      Java and all Java-based trademarks and logos are trademarks of Sun Microsystems,
      Inc. in the United States, other countries, or both.




Amazon Simple Storage Service (S3)
© Copyright IBM Corporation 1994, 2008. All rights reserved.                              Page 21 of 21

More Related Content

PPTX
AWS Simple Storage Service (s3)
PDF
AWS S3 and GLACIER
PPTX
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
PDF
AWS S3 Tutorial For Beginners | Edureka
PDF
Amazon S3 Overview
PPTX
Introduction to Amazon S3
PPTX
ABCs of AWS: S3
PPT
Intro to Amazon S3
AWS Simple Storage Service (s3)
AWS S3 and GLACIER
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 Tutorial For Beginners | Edureka
Amazon S3 Overview
Introduction to Amazon S3
ABCs of AWS: S3
Intro to Amazon S3

What's hot (6)

PPT
Amazon S3 and EC2
PPTX
AWS S3 masterclass
PPTX
Aws s3 security
PPT
S3 and Glacier
PPTX
AWS Storage - S3 Fundamentals
PPSX
Amazon ec2 s3 dynamo db
Amazon S3 and EC2
AWS S3 masterclass
Aws s3 security
S3 and Glacier
AWS Storage - S3 Fundamentals
Amazon ec2 s3 dynamo db
Ad

Viewers also liked (20)

PDF
Java Standard Edition 6 Performance
PDF
Usability Performance Benchmarks
PDF
Achieving Compliance and Control of Software-as-a-Service and Cloud-Based App...
PDF
Enterprise Social Media: Trends in Adopting Web 2.0 for the Enterprise in 2007
PDF
Managed Cloud Computing: How Service Delivery Changing for the Supplier and t...
PDF
Cloud Computing for SMBs
PDF
How Blogs and Social Media are Changing Public Relations and the Way it is Pr...
PDF
Platform Migration Guide
PDF
Java 2D API: Enhanced Graphics and Imaging for the Java Platform
PDF
Java Tuning White Paper
PDF
Enterprise Social Media: Trends and Best Practices in Adopting Web 2.0 in 2008
PDF
Secure Computing With Java
PDF
Web Site Usability
PDF
Social Media: Turning The Internet Into A Conversation
PDF
Java Standard Edition 6 Performance
ZIP
Introduction to the Java(TM) Advanced Imaging API
DOCX
Sandeep_Kadoor_Resume
DOC
Borusu Ramanjaneyulu
DOCX
JPA Resume
DOCX
Akash resume
Java Standard Edition 6 Performance
Usability Performance Benchmarks
Achieving Compliance and Control of Software-as-a-Service and Cloud-Based App...
Enterprise Social Media: Trends in Adopting Web 2.0 for the Enterprise in 2007
Managed Cloud Computing: How Service Delivery Changing for the Supplier and t...
Cloud Computing for SMBs
How Blogs and Social Media are Changing Public Relations and the Way it is Pr...
Platform Migration Guide
Java 2D API: Enhanced Graphics and Imaging for the Java Platform
Java Tuning White Paper
Enterprise Social Media: Trends and Best Practices in Adopting Web 2.0 in 2008
Secure Computing With Java
Web Site Usability
Social Media: Turning The Internet Into A Conversation
Java Standard Edition 6 Performance
Introduction to the Java(TM) Advanced Imaging API
Sandeep_Kadoor_Resume
Borusu Ramanjaneyulu
JPA Resume
Akash resume
Ad

Similar to Cloud Computing With Amazon Web Services, Part 2: Storage in the Cloud With Amazon Simple Storage Service (S3) (20)

PDF
s3
PDF
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PDF
Aws storage services whitepaper v9
PDF
Aws storage services whitepaper v9
PPTX
PPTX
Aws object storage and cdn(s3, glacier and cloud front) part 1
PDF
PDF
Introduction to Amazon Web Services
PDF
Amazon Glacier vs Amazon S3
PDF
Cloud Lesson_04_Amazon_Storage_Services.pdf
PPTX
S3inmule
PDF
AWS simple storage service
PPTX
Efficient and Secure Data Management with Cloud Storage
PPTX
Integrating with Aws s3
PPTX
Amazone s3 in mule
PPTX
Aws overview part 1(iam and storage services)
PPTX
Deep Dive on Amazon S3
PPTX
AWS Amazon S3 Mastery Bootcamp
PDF
My cool new Slideshow!
PDF
cdac@amitkumar@test123
s3
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Aws storage services whitepaper v9
Aws storage services whitepaper v9
Aws object storage and cdn(s3, glacier and cloud front) part 1
Introduction to Amazon Web Services
Amazon Glacier vs Amazon S3
Cloud Lesson_04_Amazon_Storage_Services.pdf
S3inmule
AWS simple storage service
Efficient and Secure Data Management with Cloud Storage
Integrating with Aws s3
Amazone s3 in mule
Aws overview part 1(iam and storage services)
Deep Dive on Amazon S3
AWS Amazon S3 Mastery Bootcamp
My cool new Slideshow!
cdac@amitkumar@test123

More from white paper (20)

PDF
Java Security Overview
PDF
Java Standard Edition 5 Performance
PDF
Java Standard Edition 6 Performance
PDF
Java Standard Edition 6 Performance
PDF
Java Standard Edition 6 Performance
PDF
Memory Management in the Java HotSpot Virtual Machine
PDF
J2 Se 5.0 Name And Version Change
PDF
Java Web Start
PDF
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
PDF
* Evaluation of Java Advanced Imaging (1.0.2) as a Basis for Image Proce...
PDF
Concurrency Utilities Overview
PDF
Defining a Summative Usability Test for Voting Systems
PDF
The Effect of Culture on Usability
PDF
Principles of Web Usability I - Summer 2006
PDF
Principles of Web Usabilty II - Fall 2007
PDF
Put Social Media To Work For You
PDF
Six Myths and Realities of Blogging and Social Media in the Enterprise
PDF
Enterprise Social Media: Five Common Questions
PDF
Collaboration and Social Media 2008
PDF
Customer 2.0: The Business Implications of Social Media
Java Security Overview
Java Standard Edition 5 Performance
Java Standard Edition 6 Performance
Java Standard Edition 6 Performance
Java Standard Edition 6 Performance
Memory Management in the Java HotSpot Virtual Machine
J2 Se 5.0 Name And Version Change
Java Web Start
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
* Evaluation of Java Advanced Imaging (1.0.2) as a Basis for Image Proce...
Concurrency Utilities Overview
Defining a Summative Usability Test for Voting Systems
The Effect of Culture on Usability
Principles of Web Usability I - Summer 2006
Principles of Web Usabilty II - Fall 2007
Put Social Media To Work For You
Six Myths and Realities of Blogging and Social Media in the Enterprise
Enterprise Social Media: Five Common Questions
Collaboration and Social Media 2008
Customer 2.0: The Business Implications of Social Media

Recently uploaded (20)

PPTX
Principles of Marketing, Industrial, Consumers,
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PPTX
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
PDF
Nidhal Samdaie CV - International Business Consultant
PDF
IFRS Notes in your pocket for study all the time
PDF
DOC-20250806-WA0002._20250806_112011_0000.pdf
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Business model innovation report 2022.pdf
PDF
Chapter 5_Foreign Exchange Market in .pdf
PDF
Laughter Yoga Basic Learning Workshop Manual
PDF
A Brief Introduction About Julia Allison
PDF
Types of control:Qualitative vs Quantitative
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PDF
Unit 1 Cost Accounting - Cost sheet
PPTX
5 Stages of group development guide.pptx
PPTX
HR Introduction Slide (1).pptx on hr intro
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
DOCX
Euro SEO Services 1st 3 General Updates.docx
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Principles of Marketing, Industrial, Consumers,
New Microsoft PowerPoint Presentation - Copy.pptx
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
Nidhal Samdaie CV - International Business Consultant
IFRS Notes in your pocket for study all the time
DOC-20250806-WA0002._20250806_112011_0000.pdf
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Business model innovation report 2022.pdf
Chapter 5_Foreign Exchange Market in .pdf
Laughter Yoga Basic Learning Workshop Manual
A Brief Introduction About Julia Allison
Types of control:Qualitative vs Quantitative
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
Unit 1 Cost Accounting - Cost sheet
5 Stages of group development guide.pptx
HR Introduction Slide (1).pptx on hr intro
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
Euro SEO Services 1st 3 General Updates.docx
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi

Cloud Computing With Amazon Web Services, Part 2: Storage in the Cloud With Amazon Simple Storage Service (S3)

  • 1. Cloud computing with Amazon Web Services, Part 2: Amazon Simple Storage Service (S3) Reliable, flexible, and inexpensive storage and retrieval of your data Skill Level: Introductory Prabhakar Chaganti (prabhakar@ylastic.com) CTO Ylastic, LLC. 19 Aug 2008 In this series, learn about cloud computing using Amazon Web Services. Explore how the services provide a compelling alternative for architecting and building scalable, reliable applications. This article delves into the highly scalable and responsive services provided by Amazon Simple Storage Service (S3). Learn about tools for interacting with S3, and use code samples to experiment with a simple shell. Amazon Simple Storage Service Part 1 of this series introduced the building blocks of Amazon Web Services and explains how you can use this virtual infrastructure to build Web-scale systems. In this article, learn more about Amazon Simple Storage Service (S3). S3 is a highly scalable and fast Internet data-storage system that makes it simple to store and retrieve any amount of data, at any time, from anywhere in the world. You pay for the storage and bandwidth based on your actual usage of the service. There is no setup cost, minimum cost, or recurring overhead cost. Amazon provides the administration and maintenance of the storage infrastructure, leaving you free to focus on the core functions of your systems and applications. S3 is an industrial-strength platform that is readily available for your data storage needs. It's great for: Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 21
  • 2. developerWorks® ibm.com/developerWorks • Storing the data for your applications. • Personal or enterprise backups. • Quickly and cheaply distributing media and other bandwidth-guzzling content to your customers. Valuable features of S3 include: Reliability It is designed to tolerate failures and repair the system very quickly with minimal or no downtime. Amazon provides a service level agreement (SLA) to maintain 99.99 percent availability. Simplicity S3 is built on simple concepts and provides great flexibility for developing your applications. You can build more complex storage schemes, if needed, by layering additional functions on top of S3 components. Scalability The design provides a high level of scalability and allows an easy ramp-up in service when a spike in demand hits your Web-scale applications. Inexpensive S3 rates are very competitive with other enterprise and personal data-storage solutions on the market. The three basic concepts underpinning the S3 framework are buckets, objects, and keys. Buckets Buckets are the fundamental building blocks. Each object that is stored in Amazon S3 is contained within a bucket. Think of a bucket as analogous to a folder, or a directory, on the file system. One of the key distinctions between a file folder and a bucket is that each bucket and its contents are addressable using a URL. For example, if you have a bucket named "prabhakar," then it can be addressed using the URL http://prabhakar.s3.amazonaws.com. Each S3 account can contain a maximum of 100 buckets. Buckets cannot be nested within each other, so you can't create a bucket within a bucket. You can affect the geographical location of your buckets by specifying a location constraint when you create them. This will automatically ensure that any objects that you store within that bucket will be stored in that geographical location. At this time, you can locate your buckets in either the United States or the European Union. If you do not specify a location when creating the bucket, the bucket and its contents will be stored in the Amazon Simple Storage Service (S3) Page 2 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 3. ibm.com/developerWorks developerWorks® location closest to the billing address for your account. Bucket names need to conform to the following S3 requirements: • The name must start with a number or a letter. • The name must be between 3 and 255 characters. • A valid name can contain only lowercase letters, numbers, periods, underscores, and dashes. • Though names can have numbers and periods, they cannot be in the IP address format. You cannot name a bucket 192.168.1.254. • The bucket namespace is shared among all buckets from all of the accounts in S3. Your bucket name must be unique across the entire S3. Buckets that will contain objects to be served with addressable URLs must conform to the following additional S3 requirements: • The name of the bucket must not contain any underscores. • The name must be between 3 and 63 characters. • The name cannot end with a dash. For example, myfavorite-.bucket.com is invalid. • There cannot be dashes next to periods in the name. my-.bucket.com is invalid. You can use a domain naming convention for your buckets, such as media.yourdomain.com, and thus map your existing Web domains or subdomains to Amazon S3. The actual mapping will be done when you add DNS CNAME entries to point back to S3. The big advantage with this scheme is that you can use your own domain name in your URLs to download files. The CNAME mapping will be responsible for translating between the S3 address for your bucket. For example, http://media.yourdomain.com.s3.amazonaws.com becomes the more friendly URL http://media.yourdomain.com. Objects Objects contain the data that is stored within the buckets in S3. Think of an object as the file that you want to store. Each object that is stored is composed of two entities: data and metadata. The data is the actual thing that is being stored, such as a PDF file, Word document, a video file, and so on. The stored data also has associated metadata for describing the object. Some examples of metadata are the content type of the object being stored, the date the object was last modified, and any other metadata specific to you or your application. The metadata for an object is specified by the developer as key value pairs when the object is sent to S3 for storage. Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 3 of 21
  • 4. developerWorks® ibm.com/developerWorks Unlike the limitation on the number of buckets, there are no restrictions on the number of objects. You can store an unlimited number of objects in your buckets, and each object can contain up to 5GB of data. The data in your publicly accessible S3 objects can be retrieved by HTTP, HTTPS, or BitTorrent. Distribution of large media files from your S3 account becomes very simple when using BitTorrent; Amazon will not only create the torrent for your object, it will also seed it! Keys Each object stored within an S3 bucket is identified using a unique key. This is similar in concept to the name of a file in a folder on your file system. The file name within a folder on your hard drive must be unique. Each object inside a bucket has exactly one key. The name of the bucket and the key are together used to provide the unique identification for each object that is stored in S3. Every object within S3 is addressable using a URL that combines the S3 service URL, bucket name, and unique key. If you store an object with the key my_favorite_video.mov inside the bucket named prabhakar, that object can be addressed using the URL http://prabhakar.s3.amazonaws.com/ my_favorite_video.mov. Though the concepts are simple, as shown in Figure 1, buckets, objects, and keys together provide a lot of flexibility for building your data storage solutions. You can leverage these building blocks to simply store data on S3, or use their flexibility to layer and build more complex storage and applications on top of S3 to provide additional functions. Figure 1. Conceptual view of S3 Amazon Simple Storage Service (S3) Page 4 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 5. ibm.com/developerWorks developerWorks® Access logging Each S3 bucket can have access log records that contain details on each request for a contained object. The log records are turned off by default; you have to explicitly enable the logging for each Amazon S3 bucket that you want to track. An access log record contains a lot of detail about the request, including the request type, the resource requested, and the time and date that the request was processed. The logs are provided in the S3 server access log format but can be easily Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 5 of 21
  • 6. developerWorks® ibm.com/developerWorks converted into Apache combined log format. They can then be easily parsed by any of the open source or commercial log analysis tools, such as Webalizer, to give you a human readable report and pretty graphs upon request. The reports can be very useful to gain insight into your customer base that's accessing the files. See Resources for tools you can use for easier visualization of the S3 log records. Security Each bucket and object created in S3 is private to the user account creating them. You have to explicitly grant permissions to other users and customers for them to be able to see the list of objects in your S3 buckets or to download the data contained within them. Amazon S3 provides the following security features to protect your buckets and the objects in them. Authentication Ensures that the request is being made by the user that owns the bucket or object. Each S3 request must include the Amazon Web Services access key that uniquely identifies the user. Authorization Ensures that the user trying to access the resource has the permissions or rights to the resource. Each S3 object has an access control list (ACL) associated with it that explicitly identifies the grants and permissions for that resource. You can grant access to all Amazon Web Services users or to a specific user identified by e-mail address, or you can grant anonymous access to any user. Integrity Each S3 request must be digitally signed by the requesting user with an Amazon Web Services secret key. On receipt of the request, S3 will check the signature to ensure that the request has not been tampered with in transit. Encryption You can access S3 through the HTTPS protocol to ensure that the data is transmitted through an encrypted connection. Nonrepudiation Each S3 request is time-stamped and serves as proof of the transaction. Each and every REST request made to S3 must go through the following standard steps that are essential to ensuring security: • The request and all needed parameters must be assembled into a string. • Your Amazon Web Services secret access key must be used to create a Amazon Simple Storage Service (S3) Page 6 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 7. ibm.com/developerWorks developerWorks® keyed-HMAC (Hash Message Authentication Code) signature hash of the request string. • This calculated signature is itself added as a parameter on the request. • The request is then forwarded to Amazon S3. • Amazon S3 will check to see if the provided signature is a valid keyed-HMAC hash of the request. • If the signature is valid, then (and only then) Amazon S3 will process the request. Pricing The charges for S3 are calculated based on three criteria, which are different based on the geographic location of your buckets. • The total amount of storage space used, which includes the actual size of your data content and the associated metadata. The units used by S3 for determining the storage consumed are GB-Month. The number of bytes of storage used by your account is computed every hour, and at the end of the month it's converted into the storage used for the month. The table below shows pricing for storage. Location Cost United States $0.15 per GB-Month of storage used Europe $0.18 per GB-Month of storage used • The amount of data or bandwidth transferred to and from S3. This includes all data that is uploaded and downloaded from S3. There is no charge for data transferred between EC2 and S3 buckets that are located in the United States. Data transferred between EC2 and European S3 buckets is charged at the standard data transfer rate as shown below. Location Cost United States $0.100 per GB - all data transfer in $0.170 per GB - first 10TB / month data transfer out Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 7 of 21
  • 8. developerWorks® ibm.com/developerWorks $0.130 per GB - next 40TB / month data transfer out $0.110 per GB - next 100TB / month data transfer out $0.100 per GB - data transfer out / month over 150TB Europe $0.100 per GB - all data transfer in $0.170 per GB - first 10TB / month data transfer out $0.130 per GB - next 40TB / month data transfer out $0.110 per GB - next 100TB / month data transfer out $0.100 per GB - data transfer out / month over 150TB • The number of application programming interface (API) requests performed. S3 charges fees per each request that is made using the interface—for creating objects, listing buckets, listing objects, and so on. There is no fee for deleting objects and buckets. The fees are once again slightly different based on the geographic location of the bucket. The following table shows pricing for API requests. Location Cost United States $0.01 per 1,00 0 PUT, POST, or LIST requests $0.01 per 10,000 GET and all other requests No charge for delete requests Europe $0.012 per Amazon Simple Storage Service (S3) Page 8 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 9. ibm.com/developerWorks developerWorks® 1,000 PUT, POST, or LIST requests $0.012 per 10,000 GET and all other requests No charge for delete requests Check Amazon S3 for the latest price information. You can also use the AWS Simple Monthly Calculator for calculating your monthly usage costs for S3 and the other Amazon Web Services. Getting started with Amazon Web Services and S3 To start exploring S3, you will first need to sign up for an Amazon Web Services account. You will be assigned an Amazon Web Services account number and will get the security access keys along with the x.509 security certificate that will be required when you start using the various libraries and tools for communicating with S3. All communication with any of the Amazon Web Services is through either the SOAP interface or the query/REST interface. The request messages that are sent through either of these interfaces must be digitally signed by the sending user to ensure that the messages have not been tampered with in transit, and that they are really originating from the sending user. This is the most basic part of using the Amazon Web Services APIs. Each request must be digitally signed and the signature attached to the request. Each Amazon Web Services user account is associated with the following security credentials: • An access key ID that identifies you as the person making requests through the query/REST interface. • A secret access key that is used to calculate the digital signature when you make requests through the query interface. • Public and private x.509 certificates for signing requests and authentication when using the SOAP interface. You can manage your keys and certificate, regenerate them, view account activity and usage reports, and modify your profile information from Web Services Account information. Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 9 of 21
  • 10. developerWorks® ibm.com/developerWorks After you successfully sign up for the Amazon Web Services account, you need to enable Amazon S3 service for your account using the following steps: 1. Log in to your Amazon Web Services account. 2. Navigate to the S3 home page. 3. Click on Sign Up For This Web Service on the right side of the page. 4. Provide the requested information and complete the sign-up process. Examples in this article use the query/REST interface to communicate with S3. You are going to need to obtain your access keys. You can access them from your Web Services Account information page by selecting View Access Key Identifiers. You are now set up to use Amazon Web Services, and have enabled S3 service for your account. Interacting with S3 To learn about interacting with S3, you can use existing libraries available from Amazon or from third parties and independent developers. This article does not delve into the details of communication with S3, such as how to sign requests, how to build up the XML documents used for encapsulating the data, or the parameters sent to and received from S3. We'll let the libraries handle all of that for us, and use the higher-level interface they provide. You can review the S3 developer guide for more details. You'll use an open-source Java™ library named JetS3t to explore S3, and learn about its API by viewing small snippets of code. By the end of the article you'll collect and organize these snippets into something useful: a simple and handy S3 shell that you can use at any time to experiment and interact with S3. JetS3t JetS3t is an open source Java toolkit for interacting with S3. It is more than just a library. The distribution includes several very useful S3 related tools that can be used by typical S3 users as well as service providers who build applications on top of S3. JetS3t includes: Cockpit A GUI for managing the contents of an Amazon S3 account. Synchronize A command-line application for synchronizing directories on your computer with an Amazon S3 account. Amazon Simple Storage Service (S3) Page 10 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 11. ibm.com/developerWorks developerWorks® Gatekeeper A servlet that you can use to mediate access to Amazon S3 accounts. CockpitLite A lighter version of Cockpit that routes all its operations through a mediating gatekeeper service. Uploader A GUI that routes all its operations through a mediating gatekeeper service and can be used by service providers to provide access to their S3 accounts for customers. Download the latest release of JetS3t. You can, of course, use one of these GUI applications for interacting with S3, but that won't be very helpful if you need to develop applications to interface with S3. You can download the complete source code for this article as a zipped archive, including a ready-to-go Netbeans project that you can import into your workspace. Connecting to S3 JetS3t provides an abstract class named org.jets3t.service.S3Service that must be extended by classes that implement a specific interface, such as REST or SOAP. JetS3t provides two implementations you can use for connecting and interacting with S3: • org.jets3t.service.impl.rest.httpclient.RestS3Service communicates with S3 through the REST interface. • org.jets3t.service.impl.soap.axis.SoapS3Service communicates with S3 through the SOAP interface using Apache Axis 1.4. JetS3t uses a file named jets3t.properties to configure various parameters that are used while communicating with S3. The example in this article uses the default jets3t.properties that is shipped with the distribution. The JetS3t configuration guide has a detailed explanation of the parameters. In this article you'll use the RestS3Service to connect to S3. A new RestS3Service object can be created by providing your Amazon Web Services access keys in the form of an AWSCredentials object. Keep in mind that the code snippets in this article are for demonstrating the API. To run each snippet, you have to ensure that all the required class imports are present. Refer to the source in the download package for the right imports. Or, even simpler, you can import the provided Netbeans project into your workspace for easy access to all of the source code. Listing 1. Create a new RestS3Service Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 11 of 21
  • 12. developerWorks® ibm.com/developerWorks String awsAccessKey = ”Your AWS access key”; String awsSecretKey = “Your AWS Secret key”; // use your AWS keys to create a credentials object AWSCredentials awsCredentials = new AWSCredentials(awsAccessKey, awsSecretKey); // create the service object with our AWS credentials S3Service s3Service = new RestS3Service(awsCredentials); Managing your buckets The concept of a bucket is encapsulated by the org.jets3t.service.model.S3Bucket, which extends the org.jets3t.service.model.BaseS3Object class. This class is the parent class for both buckets and objects in the JetS3t model. Each S3Bucket object provides a toString(), in addition to various accessor methods, that can be used to print the salient information for a bucket (name and geographical location of the bucket, date the bucket was created, owner’s name, and any metadata associated with the bucket). Listing 2. List buckets // list all buckets in the AWS account and print info for each bucket. S3Bucket[] buckets = s3Service.listAllBuckets(); for (S3Bucket b : buckets) { System.out.println(b); } You can create a new bucket by providing a unique name for it. The namespace for buckets is shared by all the user accounts, so sometimes finding a unique name can be challenging. You can also specify where you want the bucket and the objects that it will contain to be physically located. Listing 3. Create buckets // create a US bucket and print its info S3Bucket bucket = s3Service.createBucket(bucketName); System.out.println("Created bucket - " + bucketName + " - " + bucket); // create a EU bucket and print its info S3Bucket bucket = s3Service.createBucket(bucketName, S3Bucket.LOCATION_EUROPE); System.out.println("Created bucket - " + bucketName + " - " + bucket); You have to delete all the objects contained in the bucket prior to deleting the bucket or an exception will be raised. The RestS3Service class you have been using is fine for dealing with single objects. When you start dealing with multiple objects, it makes more sense to use a multithreaded approach to speed things up. JetS3t provides the org.jets3t.service.multithread.S3ServiceSimpleMulti class just for this purpose. You can wrap the existing s3Service object using this Amazon Simple Storage Service (S3) Page 12 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 13. ibm.com/developerWorks developerWorks® class and take full advantage of those multiprocessors. It comes in handy when you need to clear a bucket by deleting all the objects it contains. Listing 4. Delete a bucket // get the bucket S3Bucket bucket = getBucketFromName(s3Service, “my bucket”); // delete a bucket – it must be empty first s3Service.deleteBucket(bucket); // create a multi threaded version of the RestService S3ServiceSimpleMulti s3ServiceMulti = new S3ServiceSimpleMulti(s3Service); // get all the objects from bucket S3Object[] objects = s3Service.listObjects(bucket); // clear the bucket by deleting all its objects s3ServiceMulti.deleteObjects(bucket, objects); Each bucket is associated with an ACL that determines the permissions or grants for the bucket and the level of access provided to other users. You can retrieve the ACL and print the grants that are provided by it. Listing 5. Retrieve ACL for bucket // get the bucket S3Bucket bucket = getBucketFromName(s3Service, “my bucket”); // get the ACL and print it AccessControlList acl = s3Service.getBucketAcl(bucket); System.out.println(acl); The default permissions on newly created buckets and objects make them private to the owner. You can modify this by changing the ACL for a bucket and granting a group of users permission to read, write, or have full control over the bucket. Listing 6. Make a bucket and its content public // get the bucket S3Bucket bucket = getBucketFromName(s3Service, “my bucket”); // get the ACL AccessControlList acl = s3Service.getBucketAcl(bucket); // give everyone read access acl.grantPermission(GroupGrantee.ALL_USERS, Permission.PERMISSION_READ); // save changes back to S3 bucket.setAcl(acl); s3Service.putBucketAcl(bucket); You can easily enable logging for a bucket and retrieve the current logging status. After logging is enabled, detailed access logs for each file in that bucket are stored Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 13 of 21
  • 14. developerWorks® ibm.com/developerWorks in S3. Your S3 account will be charged for the storage space that is consumed by the logs. Listing 7. Logging for S3 buckets // get the bucket S3Bucket bucket = getBucketFromName(s3Service, “my bucket”); // is logging enabled? S3BucketLoggingStatus loggingStatus = s3Service.getBucketLoggingStatus(bucketName); System.out.println(loggingStatus); // enable logging S3BucketLoggingStatus newLoggingStatus = new S3BucketLoggingStatus(); // set a prefix for your log files newLoggingStatus.setLogfilePrefix(logFilePrefix); // set the target bucket name newLoggingStatus.setTargetBucketName(bucketName); // give the log_delivery group permissions to read and write from the bucket AccessControlList acl = s3Service.getBucketAcl(bucket); acl.grantPermission(GroupGrantee.LOG_DELIVERY, Permission.PERMISSION_WRITE); acl.grantPermission(GroupGrantee.LOG_DELIVERY, Permission.PERMISSION_READ_ACP); bucket.setAcl(acl); // save the changed ACL for the bucket to S3 s3Service.putBucketAcl(bucket); // save the changes to the bucket logging s3Service.setBucketLoggingStatus(bucketName, newLoggingStatus, true); System.out.println("The bucket logging status is now enabled."); Managing your objects Each object contained in a bucket is represented by the org.jets3t.service.model.S3Object. Each S3Bucket object provides a toString() that can be used to print the important details for an object: • Name of the key • Name of the containing bucket • Date the object was last modified • Any metadata associated with the object It also provides methods for accessing the various properties of an object along with its metadata. Listing 8. List objects // list objects in a bucket. S3Object[] objects = s3Service.listObjects(bucket); // print out the object details Amazon Simple Storage Service (S3) Page 14 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 15. ibm.com/developerWorks developerWorks® if (objects.length == 0) { System.out.println("No objects found"); } else { for (S3Object o : objects) { System.out.println(o); } } You can filter the list of objects that are retrieved by providing a prefix to match. Listing 9. Filter the list of objects // list objects matching a prefix. S3Object[] filteredObjects = s3Service.listObjects(bucket, “myprefix”, null); // print out the object details if (filteredObjects.length == 0) { System.out.println("No objects found"); } else { for (S3Object o : filteredObjects) { System.out.println(o); } } Each object can have associated metadata, such as the content type, date modified, and so on. You can also associate your application-specific custom metadata with an object. Listing 10. Retrieve object metadata // get the bucket S3Bucket bucket = getBucketFromName(s3Service, bucketName); // getobjects matching a prefix S3Object[] filteredObjects = s3Service.listObjects(bucket, “myprefix”, null); if (filteredObjects.length == 0) { System.out.println("No matching objects found"); }else { // get the metadata for multiple objects. S3Object[] objectsWithHeadDetails = s3ServiceMulti.getObjectsHeads(bucket, filteredObjects); // print out the metadata for (S3Object o : objectsWithHeadDetails) { System.out.println(o); } } Each newly created object is private by default. You can use JetS3t to generate a signed URL that anyone can use for downloading the object data. This URL can be created to be valid only for a certain duration, at the end of which it automatically expires. The object is still private, but you can give the URL to anyone to let them download it for a brief time. Listing 11. Generate a signed URL for object downloads Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 15 of 21
  • 16. developerWorks® ibm.com/developerWorks // get the bucket S3Bucket bucket = getBucketFromName(s3Service, bucketName); // how long should this URL be valid? int duration = Integer.parseInt(tokens.nextToken()); Calendar cal = Calendar.getInstance(); cal.add(Calendar.MINUTE, duration); Date expiryDate = cal.getTime(); // create the signed url String url = S3Service.createSignedGetUrl(bucketName, objectKey, awsCredentials, expiryDate); System.out.println("You can use this public URL to access this file for the next " + duration + " min - " + url); S3 allows a maximum of 5GB per object in a bucket. If you have objects that are larger than this, you'll need to split them up into multiple files, each 5GB in size, and then upload all of the parts to S3. Listing 12. Upload to S3 // get the bucket S3Bucket bucket = getBucketFromName(s3Service, bucketName); // create an object with the file data File fileData = new File(“/my_file_to_upload”); S3Object fileObject = new S3Object(bucket, fileData); // put the data on S3 s3Service.putObject(bucket, fileObject); System.out.println("Successfully uploaded object - " + fileObject); JetS3t provides a DownloadPackage class that makes it simple to associate the data from an S3 object to a local file and automatically save the data to it. You can use this feature to easily download objects from S3. Listing 13. Download from S3 // get the bucket S3Bucket bucket = getBucketFromName(s3Service, bucketName); // get the object S3Object fileObject = s3Service.getObject(bucket, fileName); // associate a file with the object data DownloadPackage[] downloadPackages = new DownloadPackage[1]; downloadPackages[0] = new DownloadPackage(fileObject, new File(fileObject.getKey())); // download objects to the associated files s3ServiceMulti.downloadObjects(bucket, downloadPackages); System.out.println("Successfully retrieved object to current directory"); This section covered some of the basic functions provided by the JetS3t toolkit, and how to use them to interact with S3. See Resources for more about S3 service and an in-depth discussion of the JetS3t toolkit. Amazon Simple Storage Service (S3) Page 16 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 17. ibm.com/developerWorks developerWorks® S3 Shell The interaction thus far with S3, through small code snippets, can be put into a more useful and longer lasting form by creating a simple S3 Shell program that you can run from the command line. You'll create a simple Java program that accepts the Amazon Web Services access key and secret key as parameters and returns a console prompt. You can then type a letter or a few letters, such as b for listing buckets or om for listing objects that match a certain prefix. Use this program for experimentation. The shell program contains a main() that is filled out with an implementation using the snippets of code you're using in this article. In the interest of space, the code listing for S3 Shell is not included here. The complete S3 Shell source code, along with its dependencies, is in the download. You can run the shell by simply executing the devworks-s3.jar file. Listing 14. Running the S3 Shell java -jar devworks-s3.jar my_aws_access_key my_aws_secret_key You can type h at any time in the S3 Shell to get a list of supported commands. Figure 2. Help in the S3 Shell Some of the more useful methods have been added to the S3 Shell. You can extend it to add any other functions you want to make the shell even more useful to your Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 17 of 21
  • 18. developerWorks® ibm.com/developerWorks specific case. Summary In this article you learned some of the basic concepts behind Amazon's S3 service. The JetS3t toolkit is an open source library you can use to interact with S3. You also learned how to create a simple S3 Shell using sample snippets of code, so you can continue to experiment easily and simply with S3 using the command line. Stay tuned for the next article in this series, which will explain how to use Amazon Elastic Compute Cloud (EC2) to run virtual servers in the cloud. Amazon Simple Storage Service (S3) Page 18 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 19. ibm.com/developerWorks developerWorks® Downloads Description Name Size Download method Sample code for this article devworks-s3.zip 2.93MB HTTP Information about download methods Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 19 of 21
  • 20. developerWorks® ibm.com/developerWorks Resources Learn • Learn about specific Amazon Web Services: • Amazon Simple Storage Service (S3) • Amazon Elastic Compute Cloud (EC2) • Amazon Simple Queue Service (SQS) • Amazon SimpleDB (SDB) • The Service Health Dashboard is updated by the Amazon team regarding any issues with the services. • Sign up for an Amazon Web Services account. • The Amazon Web Services Developer Connection is the gateway to all the developer resources. • Read the blog to find out the latest happenings in the world of Amazon Web Services. • From the Web Services Account information page you can manage your keys and certificate, regenerate them, view account activity and usage reports, and modify your profile information. • S3 Technical Resources has Amazon Web Services technical documentation, user guides, and other articles of interest. • Amazon S3 has the latest pricing information. Use the AWS Simple Monthly Calculator tool for calculating your monthly usage costs for S3 and the other Amazon Web Services. • Review the S3 Developer Guide for more details. • Amazon Service Level Agreement (SLA) for S3. • The S3stats resource page has several links on processing and viewing S3 log records. Logs are in the S3 Server Access Log Format, but can be easily converted into Apache Combined Log Format, then easily parsed by any of the open source or commercial log analysis tools such as Webalizer. • Learn about JetS3t, an open source Java toolkit for Amazon S3, developed by James Murty. See the toolkit documentation, and get detailed explanations of parameters in the configuration guide. • In the Architecture area on developerWorks, get the resources you need to advance your skills in the architecture arena. Amazon Simple Storage Service (S3) Page 20 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved.
  • 21. ibm.com/developerWorks developerWorks® • Browse the technology bookstore for books on these and other technical topics. Get products and technologies • Download JetS3t and other tools. • Download IBM product evaluation versions and get your hands on application development tools and middleware products from IBM® DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®. Discuss • Check out developerWorks blogs and get involved in the developerWorks community. About the author Prabhakar Chaganti Prabhakar Chaganti is the CTO of Ylastic, a start-up that is building a single unified interface to architect, manage, and monitor a user's entire AWS Cloud computing environment: EC2, S3, SQS and SimpleDB. He is the author of two recent books, Xen Virtualization and GWT Java AJAX Programming. He is also the winner of the community choice award for the most innovative virtual appliance in the VMware Global Virtual Appliance Challenge. Trademarks IBM, the IBM logo, ibm.com, DB2, developerWorks, Lotus, Rational, Tivoli, and WebSphere are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Amazon Simple Storage Service (S3) © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 21 of 21