- 3 Minutes to read
- DarkLight
S3
- 3 Minutes to read
- DarkLight
Amazon S3 (Simple Storage Service) is a scalable cloud storage service provided by Amazon Web Services (AWS). It offers a way to store and retrieve data, such as files, images, videos, and backups, in a highly durable and easily accessible manner, making it a foundational component for various cloud-based applications and services.
Prerequisites
- You have created an AWS S3 bucket.
- You have created an IAM user with AmazonS3FullAccess policy attached to your bucket.
Authorize the Connection to Amazon S3
In AWS Portal
Create S3 Bucket
- Sign in to the AWS Management Portal and navigate to S3 Console.
- Click on +Create bucket. Name your bucket and choose AWS Region. You will need to provide this information to Dataddo.
- Click on the Create button to finalize the process.
Configure the Access Permissions
- Navigate to the IAM service, click on Users and continue with Add User.
- Name your user and for the access type, select Programmatic access.
- In the Set permissions step, choose Attach existing policies directly. Attach a policy that grants the necessary S3 permissions. This could be
- The AmazonS3FullAccess policy, which grants full access to all S3 resources, or
- A custom policy that only grants access to the necessary bucket and actions.
- Save the IAM user's credentials. You will be given an access key and a secret key. These credentials will be needed for configuring the connection to your S3 bucket in Dataddo.
Use the example below for configuring the Access Policy. Make sure to replace your-bucket-name with your bucket identifier.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::your-bucket-name/*"
}
]
}
In Dataddo
- In the Authorizers tab, click on Authorize New Service and select S3.
- You will be asked to fill the following fields
- Bucket: Provide the identifier of S3 bucket you want to use for reading or writing the data.
- Region: Region of the S3 bucket.
- Key: Provide your AWS Access Key.
- Secret: Provide your AWS Secret Key.
- Click on Save.
Create a New S3 Destination
- Under the Destinations tab, click on the Create Destination button and select the destination from the list.
- Select your account from the drop-down menu.
- Fill in the Path. Use the name of the folder in your Container + slash (e.g. "database/" ).
- Name your destination and click on Save to create your destination.
Click on Add new Authorizer in drop-down menu during authorizer selection and follow the on-screen prompts. You can also go to the Authorizers tab and click on Add New Service.
Creating a Flow to S3 Storage
- Navigate to Flows and click on Create Flow.
- Click on Connect Your Data to add your sources.
- Click on Connect Your Data Destination to add the destination.
- Choose the Write mode and fill in the other required information.
- Check the Data Preview to see if your configuration is correct.
- Name your flow, and click on Create Flow to finish the setup.
File Partitioning
File partitioning refers to the practice of dividing large datasets into smaller, more manageable segments or partitions based on specific criteria, such as values in a particular column or range of values. Each partition contains a subset of the data that shares common attributes or characteristics. File partitioning is commonly used to improve data organization, query performance, and data management.
Dataddo supports file partitioning during flow creation. If you use, for example,
file_{{1d1|Ymd}}
Dataddo will create a file every day, e.g. file_20xx0101, file_20xx0102 etc.