Amazon S3
  • 3 Minutes to read
  • Dark
    Light

Amazon S3

  • Dark
    Light

Article Summary

Amazon S3 (Simple Storage Service) is a scalable cloud storage service provided by Amazon Web Services (AWS). It offers a way to store and retrieve data, such as files, images, videos, and backups, in a highly durable and easily accessible manner, making it a foundational component for various cloud-based applications and services.

Prerequisites

  • You have created an AWS S3 bucket.
  • You have created an IAM user with AmazonS3FullAccess policy attached to your bucket.

Authorize Connection to Amazon S3

In AWS Portal

Create S3 Bucket

  1. Sign in to the AWS Management Portal and navigate to S3 Console.
  2. Click on +Create bucket. Name your bucket and choose AWS Region. You will need to provide this information to Dataddo.
  3. Click on the Create button to finalize the process.

Configure the Access Permissions

  1. Navigate to the IAM service, click on Users and continue with Add User.
  2. Name your user and for the access type, select Programmatic access.
  3. In the Set permissions step, choose Attach existing policies directly. Attach a policy that grants the necessary S3 permissions. This could be one of the following:
    • The AmazonS3FullAccess policy, which grants full access to all S3 resources.
    • A custom policy that only grants access to the necessary bucket and actions.
  4. Save the IAM user's credentials. You will be given an access key and a secret key. These credentials will be needed for configuring the connection to your S3 bucket in Dataddo.

Use the template below for configuring the Access Policy. Make sure to replace your-bucket-name with your bucket identifier.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:GetObject",
        "s3:GetObjectAcl",
        "s3:DeleteObject",
        "s3:ListBucket",
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}

In Dataddo

  1. In the Authorizers tab, click on Authorize New Service and select S3.
  2. You will be asked to fill the following fields
    1. Bucket: Provide the identifier of S3 bucket you want to use for reading or writing the data.
    2. Region: Region of the S3 bucket.
    3. Key: Provide your AWS Access Key.
    4. Secret: Provide your AWS Secret Key.
  3. Click on Save.

Create a New S3 Destination

  1. Under the Destinations tab, click on the Create Destination button and select the destination from the list.
  2. Select your account from the drop-down menu.
  3. Fill in the Path. Use the name of the folder in your Container + slash (e.g. "database/" ).
  4. Name your destination and click on Save to create your destination.
Need to authorize another connection?

Click on Add new Authorizer in drop-down menu during authorizer selection and follow the on-screen prompts. You can also go to the Authorizers page and click on Add New Service.

Creating a Flow to S3 Storage

  1. Navigate to Flows and click on Create Flow.
  2. Click on Connect Your Data to add your source(s).
  3. Click on Connect Your Data Destination to add the destination.
  4. Choose the write mode and fill in the other required information.
  5. Check the Data Preview to see if your configuration is correct.
  6. Name your flow and click on Create Flow to finish the setup.

File Partitioning

File partitioning splits large datasets into smaller, manageable partitions, based on criteria like date. This technique enhances data organization, query performance, and management by grouping subsets of data with shared attributes.

During flow creation:

  • Select one of the predefined file name patterns.
  • Define your own custom name to suit your partitioning needs.

Example of a custom file name
When creating a custom file name, use variations of the offered file names.

For example, use a base file name and add a different date range pattern :

xyz_{{1d1|Ymd}}

Using this file name, Dataddo will create a new file named xyz every day, e.g. xyz_20xx0101, xyz_20xx0102 etc.


Troubleshooting

Error Message kms:GenerateDataKey

ERROR MESSAGE

Action failed: stream transfer: write data from stream: upload JSON to s3: uploading data to S3 bucket 'bucket-name': operation error S3: PutObject, https response error StatusCode: 403, RequestID: request_id, HostID: host_id api error AccessDenied: User: user_name is not authorized to perform: kms:GenerateDataKey on resource: arn:aws:s3:::bucket-name/ because no identity-based policy allows the kms:GenerateDataKey action

This issue is most likely caused by your buckets utilizing special encryption methods. To solve this, you'll need to include the kms:GenerateDataKey permission scope to your IAM user using, for example, this template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "kms:GenerateDataKey",
            "Resource": "arn:aws:s3:::your-bucket-name/*"
        }
    ]
}


Was this article helpful?

What's Next