Skip to content

Instantly share code, notes, and snippets.

@chrisswanda
Last active March 6, 2023 13:51
Show Gist options
  • Save chrisswanda/f9ad17391c03382f7f7ccb8c0a9bc82f to your computer and use it in GitHub Desktop.
Save chrisswanda/f9ad17391c03382f7f7ccb8c0a9bc82f to your computer and use it in GitHub Desktop.
AWS Datasync between two S3 buckets across two different AWS accounts

What is it?

DataSync fully automates the data transfer. It comes with retry and network resiliency mechanisms, network optimizations, built-in task scheduling, monitoring via the DataSync API and Console, and CloudWatch metrics, events and logs that provide granular visibility into the transfer process. DataSync performs data integrity verification both during the transfer and at the end of the transfer.

DataSync provides end-to-end security, and integrates directly with AWS storage services. All data transferred between the source and destination is encrypted via TLS, and access to your AWS storage is enabled via built-in AWS security mechanisms such as IAM roles. DataSync with VPC endpoints are enabled to ensure that data transferred between an organization and AWS does not traverse the public internet, further increasing the security of data as it is copied over the network.

How to copy between two different AWS Accounts and S3 buckets

You will need to define a source account and bucket, and a destination account and bucket. In these examples our source will be source and the destination will be destination.

In your AWS source account

In the source AWS account, create an IAM role that will allow the s3 resource in your AWS destination bucket.

{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": [
          "s3:GetBucketLocation",
          "s3:ListBucket",
          "s3:ListBucketMultipartUploads"
        ],
        "Effect": "Allow",
        "Resource": "arn:aws:s3:::destination"
      },
      {
        "Action": [
          "s3:AbortMultipartUpload",
          "s3:DeleteObject",
          "s3:GetObject",
          "s3:ListMultipartUploadParts",
          "s3:PutObject",
          "s3:GetObjectTagging",
          "s3:PutObjectTagging"
        ],
        "Effect": "Allow",
        "Resource": "arn:aws:s3:::destination/*"
      }
    ]
  }

Also add a trust policy to this role for Datasync

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "datasync.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Note the ARN from this role that you created, because you will need it later. For this example it is arn:aws:iam::1234567890:role/Datasync-destination-role

Next log into your AWS destination account, go to your S3 bucket (s3://destination)that you want to land the data on, go to Permissions, and under Object Ownership, edit it and select ACLs disabled (recommended):

alt text

In your AWS destination account

Go to your destination s3 bucket (s3://destination), and apply the following policy to it: Note that the Principal is the role you created in the AWS source account.

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "DataSyncCreateS3LocationAndTaskAccess",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::1234567890:role/Datasync-destination-role"
            },
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:ListMultipartUploadParts",
                "s3:PutObject",
                "s3:GetObjectTagging",
                "s3:PutObjectTagging"
            ],
            "Resource": [
                "arn:aws:s3:::destination",
                "arn:aws:s3:::destination/*"
            ]
        },
        {
            "Sid": "DataSyncCreateS3Location",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::1234567890:root"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::destination"
        }
    ]
}

Save this.

In the terminal of your AWS source account.

In your terminal, make sure that your profile is for the AWS source account. Remember above about remembering the ARN for the role, here is where you need it.

In your terminal run:

aws datasync create-location-s3 --s3-bucket-arn arn:aws:s3:::destination --s3-config '{"BucketAccessRoleArn":"arn:aws:iam::1234567890:role/Datasync-destination-role"}'

This is creating the S3 location in your destination bucket, and configuring it to use the ARN that you defined earlier. (AWS iz teh dum, because we should just be able to do it from the console... whatevs.)

If successful you should see something like:

{
    "LocationArn": "arn:aws:datasync:us-east-2:1234567890:location/loc-0b72deadbeefe2d4d3752"
}

In the console of your AWS source account

Now that you have created the Account B destination S3 bucket location, log in to Account A and select the Region that the Account A source bucket resides in. Create the source bucket location, and select Autogenerate to create the IAM policy for this location:

Go to Datasync, and create a new source bucket location:

alt text

Creating Tasks to migrate data

Once you have created both the source and destination locations, navigate to Tasks under the DataSync page and select Create task. First, select the source location, then select Next:

text

Next select the destination location:

text

Provide your task with a name and configure to your specifications. When complete, choose Next:

text

Lastly review your configurations and select Create task. You’re now ready to execute your task and start copying objects from the source S3 bucket to your destination S3 bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment