๐๐ 10 Features of AWS S3๐๐
๐ To view the complete course https://www.101daysofdevops.com/courses/100-days-of-aws/
โก๏ธ You can contact me via https://linktr.ee/prashant.lakhera
S3 Storage Class
Amazon S3 offers a range of storage classes designed for different use cases.
Standard(default)
- When you store the object, it stores at-least three availability zones.
- 11 9โs of durability(if you store 10 million objects with an S3 bucket, then on average you might lose one object every 10000 years)
- Replication uses MD5 checksum and Cyclic Redundancy checks(CRCs) to detect and fix data corruption.
- This class is used for the data which is frequently accessed and the data which is critical and non-replaceable.
S3 Standard Infrequent Access(S3 Standard-IA)
- This class is much more cost-effective and half of the price as compared to the S3 standard class.
- For every gigabyte of data transfer, there is a retrieval fee.
- You need to store the objectโs minimum of 30 days.
- There is also a minimum capacity charge of 128KB per object.
- This class should be used for long-lived, infrequently accessed data, but the data is critical and non-replaceable.
S3 One Zone In frequent Access(S3 One Zone-IA)
- It shares all the same attributes as S3 Standard-IA(retrieval fee, minimum 30 days, and 128KB per object), but data is stored in one availability zone.
- It should be used for long-lived, infrequently accessed data but the data which is non-critical and easily replaceable.
S3 Intelligent Tiering
- Itโs a storage class that contains five different storage tiers. When you move an object in this class, there is a range of ways that an object can be stored(Frequent Access, Infrequent Access, Archive Instance Access, Archive Access, and Deep Archive).
- The advantage of using this class, you donโt need to move objects manually from one tier to another; S3 intelligent tiering does this for you.
- Objects that are frequently accessed are moved back to the frequent access tier. There are no retrieval fees for accessing objects.
- This tier has a monitoring and automation cost per 1000 objects.
- It should be used for long-lived data with changing or unknown patterns.
S3 Glacier โ Instant
- This class is similar to S3 Standard IA with cheaper storage, more expensive retrieval, and a longer minimum duration(90 days).
- It should be used for long-live data, accessed once per quarter within millisecond access.
S3 Glacier โ Flexible
- Storage cost is 1/6 as compared to the S3 standard.
- Data is retrieved to S3 Standard-IA temporarily by using any of the retrieval processes Expedited(1โ5 minutes), Standard(3โ5 hours), and Bulk(5โ12 hours).
- It should be used for archival data where frequent or real-time access isnโt needed(once per year).
S3 Glacier โ Deep Archive
- This class is the cheapest storage class, 40KB min size and a longer minimum duration(180 days)
- Data is retrieved to S3 Standard-IA temporarily by using any of the retrieval process Standard(12 hours) and Bulk(48 hours).
- Should be used for archival data that rarely, if ever, needs to be accessed(hours or days of retrieval), e.g., Regulatory or Legal data.
S3 Lifecycle Configuration
You can create lifecycle rules on S3 buckets, which can automatically transition or expire objects in the Bucket. This is one way to optimize storage cost in a larger S3 Bucket. A lifecycle configuration is a set of rules that applies to a bucket and consists of a set of actions that apply based on criteria. These rules can apply to Bucket, or they can apply to groups of objects defined by prefixes or tags. The action which is applied can be of two types:
- Transition Actions: Change storage class e.g. S3 Standard to S3 Infrequent Access, after 30 days.
- Expiration Actions: Delete the object after n number of days
S3 Replication
This feature allows you to configure the replication of objects between a source and destination S3 bucket. Two types of replication supported by S3:
- Cross-region replication(CRR): It allows the replication of objects from a source bucket to a destination bucket in different AWS regions.
- Same region replication(SRR): It allows the replication of objects from a source bucket to a destination bucket in the same AWS regions.
Features:
- You can replicate all objects or a subset of objects.
- You can use the same storage class both on source or destination(default) or can be cheaper on destination to save cost.
- Ownership of objects and default is owned by the same account both in source and destination, but you can always change the ownership, and on the destination, it can be owned by the destination account.
- Replication time control which adds a 15 min replication SLA to this replication process. Without this, itโs a best-effort process. This will add extra monitoring to see if the objects are queued.
NOTE:
- You must need versioning enabled on both the source and destination.
- Itโs a one-way replication from source to destination.
- Itโs capable of replicating objects unencrypted, SSE-S3, and SSE-KMS(extra configuration required)
- It canโt replicate any object where the storage class is Glacier and Glacier Deep Archive
- The delete marker is not enabled by default. So any deleted object is replicated by default.
NOTE: Make sure both source and destination versioning need to be enabled.
- If you want, when you delete an object on the source, it should be deleted on the destination, then enable Delete marker replication.
- It will then ask whether you want to replicate the existing object.
S3 Presigned url
A presigned url is a url that you can grant to your user to give temporary access to a specific S3 object in your Bucket safely and securely, where the credentials for accessing that object are encoded on the URL. Using the url user can either upload/download the object from S3.
To create a pre-signed url
- Go to the S3 bucket https://console.aws.amazon.com/s3/
- Choose the bucket and then choose the object you want to create the presigned url
- In the action menu, click on Share with a presigned URL
- Specify the time interval for how long you want this presigned url to be valid and then click on Create presigned URL
NOTE: Using the S3 console, you can share an object with a presigned URL for up to 12 hours or until your session expires. To create a presigned URL with a longer time interval, use the AWS CLI or AWS SDK. Time intervals for presigned URLs can be restricted by your IAM policy.
- Presigned url is automatically copied to your clipboard.
For more info, check this doc https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html
- To create a pre-signed url on the command line, pass your bucket name, object name and expiry interval(604800 == 1 week)(default expiry time is 1hr)
aws s3 presign s3://<bucket name>/<object name> --expires-in 604800
For more info check this doc https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/presign.html
S3 select and Glacier select
Using S3 and Glacier select, you can retrieve the parts of objects rather than the entire object. S3/Glacier select lets you use SQL like statements to select part of that object, and only this pre-filtered path will be sent back to the client in a pre-filtered way. You can operate in several file formats like csv, json, parquet etc.
One common question is that even AWS Athena does the same work, so what is the difference between Athena vs. S3 select? S3 select can run queries only on one object itโs a lightweight solution, whereas Athena runs queries across large datasets.
- To use this feature, choose the object, and under the Actions drop-down, choose Query with S3 Select.
- Choose the Input and Output settings based on your file type and then run the SQL query
- You will see the output details at the bottom of the screen
For more info check the following doc https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-select.html
S3 Access Points
S3 access points are the named network endpoints attached to buckets that you can use to perform S3 object operations. You can create many access points to the Bucket, and each access point has distinct access controls and permissions that S3 applies for any request that is made through that access point. You can even configure the network access controls for access points, i.e., it can be accessed from VPC or Internet. Also, each access point will get its own endpoint address. So rather than using the default endpoint for S3 and accessing the Bucket as a whole, users can create a specifically created access point along with the specific endpoint address and get access to the part of that Bucket or the whole Bucket with certain restrictions.
To create an access point.
- Go to the S3 console, click on the Bucket for which you want to create the access point, and click on Create access point.
- Fill in the details, select the options based on your requirements, and click on Create access point.
- You will now access(or depend upon the policy you configured)the bucket object using this access point.
> aws s3 ls s3://arn:aws:s3:us-east-1:<aws account id>:accesspoint/my-demo-ap
2022-11-14 14:38:33 533267 Screen Shot 2022-11-13 at 17.37.46.png
2022-11-14 15:31:34 1513 example.csv
2022-11-14 14:58:20 886507 s3-bucket-public.png
- To create an access point on the command line
aws s3control create-access-point --name example-ap --account-id 123456789012 --bucket example-bucket
For more info check the following doc https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-access-points.html
S3 Object Lock
AWS S3 object lock allows you to store objects using a write once, read many(WORM) model. This is useful in a scenario where you know once the data is written, it will not change or be deleted to satisfy the legal/compliance requirement. This can only be enabled on the new Bucket, if you want to enable it on the existing Bucket, then you need to contact AWS support. Once you enable the object lock, versioning is enabled, and you canโt disable object lock or suspend versioning on that Bucket.
There are two ways it manages object retention.
- Retention period: You specify the retention period in days or years. There are two retention modes: Compliance(object version canโt be adjusted, overwritten, or deleted, same with retention mode, even root user canโt make changes until the retention period expires). Governance mode(itโs less strict, but the object version canโt be adjusted, overwritten, or deleted; simultaneously, you can give special permissions to allow lock settings to be adjusted).
- Legal hold: With this, you donโt set a retention period at all. Instead, you set an object version to be on or off. Useful to prevent accidental deletion of critical objects.
Go to the S3 console https://console.aws.amazon.com/s3/ and click on Create bucket
- Under Advanced settings, you will see Object Lock and click on Create bucket. Read the warning message carefully as it mentioned โYou might be blocked from deleting the objects and the bucket. Additional Object Lock configuration is required in bucket details after bucket creation to protect objects in this bucket from being deleted or overwritten.โ
For more info check the following doc https://aws.amazon.com/blogs/storage/protecting-data-with-amazon-s3-object-lock/
AWS S3 Storage Lens
AWS S3 Storage Lens provides organization-wide storage visibility into object storage usage, activity trends, and actionable recommendations to improve cost efficiency. Itโs automatically created for every AWS customer account
For more info check the following doc https://aws.amazon.com/blogs/aws/s3-storage-lens/
AWS S3 Event
The Amazon S3 notification feature enables you to receive notifications(SNS/SQS/Lambda) when certain events(mentioned below)happen in your bucket.
Please check the following doc on how to configure it https://devopslearning.medium.com/100-days-of-devops-day-7-aws-s3-event-cf64c6699ca1
S3 Bucket to host a static website
You can use an S3 bucket to host a static website. To enable static website hosting.
Go to the specific bucket โ Properties
- Scroll down to the bottom of the page
- Click on Edit and select Enable. Fill the details based on your requirement and click on Save changes.
S3 Encryption
S3 is capable of supporting two main methods of encryptions.
- Client-Side Encryption: In client-side encryption, objects are encrypted by the client before they leave for transit to S3.
- Server-Side Encryption: When you use server-side encryption, Amazon S3 encrypts the object before saving it to disk and decrypts it when you download the objects. Server Side Encryption of three types
- Server-Side Encryption with Customer-Provided Keys(SSE-C): With this method, the customer is responsible for encryption keys that are used for encryption and decryption, and the S3 service manages the actual encryption and decryption process.
- Server-Side Encryption with Amazon S3-Managed Keys(SSE-S3): AWS handles the encryption and decryption process and key generation and management. It uses the AES256 algorithm.
- Server-Side Encryption with KMS(SSE-KMS): This is similar to SSE-S3, but the difference is in the case of SSE-S3, where AWS manages the encryption key. In this case, the encryption key is managed by Key Management Service(KMS). KMS adds a layer of security and provides permission who can perform encrypt/decrypt operations. Also, KMS provides an audit trail in which the user uses a key.
Now when you upload a file to S3, you can specify the encryption type
NOTE: Buckets are not encrypted, the object inside buckets can be encrypted.