Skip to content

Instantly share code, notes, and snippets.

@weavenet
Created May 4, 2015 05:21
Show Gist options
  • Save weavenet/f40b09847ac17dd99d16 to your computer and use it in GitHub Desktop.
Save weavenet/f40b09847ac17dd99d16 to your computer and use it in GitHub Desktop.
Delete all versions of all files in s3 versioned bucket using AWS CLI and jq.
#!/bin/bash
bucket=$1
set -e
echo "Removing all versions from $bucket"
versions=`aws s3api list-object-versions --bucket $bucket |jq '.Versions'`
markers=`aws s3api list-object-versions --bucket $bucket |jq '.DeleteMarkers'`
let count=`echo $versions |jq 'length'`-1
if [ $count -gt -1 ]; then
echo "removing files"
for i in $(seq 0 $count); do
key=`echo $versions | jq .[$i].Key |sed -e 's/\"//g'`
versionId=`echo $versions | jq .[$i].VersionId |sed -e 's/\"//g'`
cmd="aws s3api delete-object --bucket $bucket --key $key --version-id $versionId"
echo $cmd
$cmd
done
fi
let count=`echo $markers |jq 'length'`-1
if [ $count -gt -1 ]; then
echo "removing delete markers"
for i in $(seq 0 $count); do
key=`echo $markers | jq .[$i].Key |sed -e 's/\"//g'`
versionId=`echo $markers | jq .[$i].VersionId |sed -e 's/\"//g'`
cmd="aws s3api delete-object --bucket $bucket --key $key --version-id $versionId"
echo $cmd
$cmd
done
fi
@nicdoye
Copy link

nicdoye commented Oct 30, 2017

@jnawk Nice! Minor typo: it should be boto3.Session() not boto3.session().

@mattbryson
Copy link

mattbryson commented Nov 9, 2017

@jnawk Thats awesome! Saved me so much time. 2 quick questions...

  1. object_versions.delete() doesn't appear to remove zero byte keys, any idea how to get round that?
  2. is there a way to enable verbose output on boto3? .delete() can take a long time on large buckets, would be good to see some sort of progress...

UPDATE: its not Zero byte keys, its Mac OS "Icon?" files, when uploaded to S3, a newline gets appended to the file name, which stuffs all the S3 tooling, even the console. Have raised this with AWS.

@joelthompson
Copy link

There's a small bug when you have only a single object and/or delete marker. Basically, with this:

let count=`echo $versions |jq 'length'`-1

For some reason, if count is 0, bash counts that as an error, and because you do a set -e above, this causes the script to fail out.

@JohnVonNeumann
Copy link

You are a champion amongst men. Cheers.

@arichiardi
Copy link

arichiardi commented May 9, 2018

Thanks for this script, I tweaked the original one in order to urldecode UTF-8 keys coming from the bucket:

#!/bin/bash

bucket=$1

set -e

echo "Removing all versions from $bucket"

function urldecode {
    echo $(python -c "import sys, urllib as ul; print ul.unquote_plus(sys.argv[1])" $1);
}

versions=`aws s3api list-object-versions --encoding-type url --bucket $bucket | jq '.Versions'`
markers=`aws s3api list-object-versions --encoding-type url --bucket $bucket | jq '.DeleteMarkers'`

echo "removing files"
for version in $(echo "${versions}" | jq -r '.[] | @base64'); do
    version=$(echo ${version} | base64 --decode)

    key=`echo $version | jq -r .Key`
    versionId=`echo $version | jq -r .VersionId`
    decodedVersionId=$(urldecode "$key")
    cmd="aws s3api delete-object --bucket $bucket --key $decodedVersionId --version-id $versionId"
    echo $cmd
    $cmd
done

echo "removing delete markers"
for marker in $(echo "${markers}" | jq -r '.[] | @base64'); do
    marker=$(echo ${marker} | base64 --decode)

    key=`echo $marker | jq -r .Key`
    versionId=`echo $marker | jq -r .VersionId`
    decodedVersionId=$(urldecode "$key")
    cmd="aws s3api delete-object --bucket $bucket --key $decodedVersionId --version-id $versionId"
    echo $cmd
    $cmd
done

@tokozedg
Copy link

bucket.object_versions.filter(
        Prefix='folder'
).delete()

This worked for me very well.

@ip1981
Copy link

ip1981 commented Sep 24, 2018

Guys, there is set -x for this:

cmd="aws s3api delete-object --bucket $bucket --key $key --version-id $versionId"
echo $cmd

@ip1981
Copy link

ip1981 commented Sep 24, 2018

And you can use Jq to build up command lines:

... | jq -r '.Versions[] | "aws s3api delete-object --bucket capitalmatch-backups --key \"\(.Key)\" --version-id \"\(.VersionId)\""'

@felipekiko
Copy link

Works for me with some encoding filenames in version files. Thanks!!

@wknapik
Copy link

wknapik commented Mar 11, 2019

@kaosinc
Copy link

kaosinc commented Feb 16, 2020

Thank you SO MUCH!

@nashjain
Copy link

nashjain commented Apr 8, 2020

There is actually a much simpler and faster approach:

bucket=$1
fileToDelete=$2
deleteBefore=$3
fileName='aws_delete.json'
rm $fileName
versionsToDelete=`aws s3api list-object-versions --bucket "$bucket" --prefix "$fileToDelete" --query "Versions[?(LastModified<'$deleteBefore')].{Key: Key, VersionId: VersionId}"`
cat << EOF > $fileName
{"Objects":$versionsToDelete, "Quiet":true}
EOF
aws s3api delete-objects --bucket "$bucket" --delete file://$fileName

s3api delete-objects can handle up to 1000 records.

Want to do more advance stuff? Check out my gist.

@RahulAdepu92
Copy link

Note: Until and unless you don't have "s3:DeleteObjectVersion" included in policy under IAM role, all version deletion wont be working.

@marcuspaget
Copy link

Thanks @nashjain ... here is my version off yours :)

(echo -n '{"Objects":';aws s3api list-object-versions --bucket "$bucket" --prefix "$prefix" --max-items 1000 --query "Versions[?(LastModified<'2020-07-21')].{Key: Key, VersionId: VersionId}" | sed 's#]$#] , "Quiet":true}#') > _TMP_DELETE && aws s3api delete-objects --bucket "$bucket" --delete file://_TMP_DELETE

To do 1000 at a time.

@marcuspaget
Copy link

marcuspaget commented Jul 29, 2020

Found I could put in a loop and get through about 3 iterations (or 3k objects a minute). So produced this script which downloads 10k objects, then uses jq to slice 1k at a time and deletes, looping 4k times. Now up to around 4.5k objects a minute.

bucket=_BUCKET_NAME_
prefix=_PREFIX_

cnt=0
FN=/tmp/_TMP_DELETE
rm $FN 2> /dev/null

while [ $cnt -lt 4000 ]
do
	aws s3api list-object-versions --bucket "$bucket" --prefix "$prefix" --max-items 10000 --query "Versions[?(LastModified<'2019-07-21')].{Key: Key, VersionId: VersionId}" > $FN
	rm $FN.upload 2> /dev/null
	s=0
	while [ $s -lt 9999 ]
	do
		((e=s+999))
		#echo taking $s to $e
		(echo -n '{"Objects":';jq ".[$s:$e]" < $FN 2>&1 | sed 's#]$#] , "Quiet":true}#') > $FN.upload
		aws s3api delete-objects --bucket "$bucket" --delete file://$FN.upload && rm $FN.upload
		((s=e+1))
		#echo s is $s and e is $e
		echo -n "."
	done

((cnt++))
((tot=cnt*10))
echo on run $cnt total deleted ${tot}k objects

done

@marcuspaget
Copy link

Okay ... faster still (~10k/min) - just dump all in the file then:

bucket=_BUCKET_
prefix=_PREFIX_
SRCFN=_DUMP_FILE_
FN=/tmp/_TMP_DELETE

aws s3api list-object-versions --bucket "$bucket" --prefix "$prefix" --query "Versions[?(LastModified<'2019-07-21')].{Key: Key, VersionId: VersionId}" > $SRCFN

rm $FN 2> /dev/null
s=0
c=`grep -c VersionId $SRCFN`

while [ $s -lt $c ]
do
	((e=s+999))
	echo taking $s to $e
	(echo -n '{"Objects":';jq ".[$s:$e]" < $SRCFN 2>&1 | sed 's#]$#] , "Quiet":true}#') > $FN
	aws s3api delete-objects --bucket "$bucket" --delete file://$FN && rm $FN
	((s=e+1))
	sleep 1
	#echo s is $s and e is $e
	#echo -n "."
done

@git-hemant
Copy link

git-hemant commented Aug 31, 2020

Yet another minor update to fix the issue when the key (file name) contain spaces

`#!/bin/bash

bucket=$1

set -e

echo "Removing all versions from $bucket"

versions=aws s3api list-object-versions --bucket $bucket |jq '.Versions'
markers=aws s3api list-object-versions --bucket $bucket |jq '.DeleteMarkers'
let count=echo $versions |jq 'length'-1

if [ $count -gt -1 ]; then
echo "removing files"
for i in $(seq 0 $count); do
key=echo $versions | jq .[$i].Key |sed -e 's/\"//g'
versionId=echo $versions | jq .[$i].VersionId |sed -e 's/\"//g'
cmd="aws s3api delete-object --bucket $bucket --key "$key" --version-id $versionId"
echo $cmd
eval $cmd
done
fi

let count=echo $markers |jq 'length'-1

if [ $count -gt -1 ]; then
echo "removing delete markers"

    for i in $(seq 0 $count); do
            key=`echo $markers | jq .[$i].Key |sed -e 's/\"//g'`
            versionId=`echo $markers | jq .[$i].VersionId |sed -e 's/\"//g'`
            cmd="aws s3api delete-object --bucket $bucket --key \"$key\" --version-id $versionId"
            echo $cmd
            eval $cmd
    done

fi`

@morufajibikekpmg
Copy link

morufajibikekpmg commented Dec 22, 2020

AWS CLI requires python, and there's a much much better way to do this using python:

import boto3
session = boto3.session()
s3 = session.resource(service_name='s3')
bucket = s3.Bucket('your_bucket_name')
bucket.object_versions.delete()
# bucket.delete()

This could be, if you want to use a named profile:

import boto3
session = boto3.session.Session(profile_name='your_profile_name')
s3 = session.resource(service_name='s3')
bucket = s3.Bucket('your_bucket_name')

## uncomment the line below to delete your bucket objects versions; BE CAREFUL!!!
# bucket.object_versions.delete()

## uncomment the line below to delete your bucket; BE CAREFUL!!!
# bucket.delete()

@forzagreen
Copy link

With the AWS CLI v2, by default it returns all output through a pager program (e.g. less). Cf. Output paging.
To disable it, run:

export AWS_PAGER=""

@l0b0
Copy link

l0b0 commented Mar 15, 2021

Another version:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail

if [[ "$#" -eq 0 ]]
then
    cat >&2 << 'EOF'
./clear-s3-buckets.bash BUCKET [BUCKET…]

Deletes *all* versions of *all* files in *all* given buckets. Only to be used in case of emergency!
EOF
    exit 1
fi

read -n1 -p "THIS WILL DELETE EVERYTHING IN BUCKETS ${*}! Press Ctrl-c to cancel or anything else to continue: " -r

delete_objects() {
    count="$(jq length <<< "$1")"

    if [[ "$count" -eq 0 ]]
    then
        echo "No objects found; skipping" >&2
        return
    fi

    echo "Removing objects"
    for index in $(seq 0 $(("$count" - 1)))
    do
        key="$(jq --raw-output ".[${index}].Key" <<< "$1")"
        version_id="$(jq --raw-output ".[${index}].VersionId" <<< "$1")"
        delete_command=(aws s3api delete-object --bucket="$bucket" --key="$key" --version-id="$version_id")
        printf '%q ' "${delete_command[@]}"
        printf '\n'
        "${delete_command[@]}"
    done
}

for bucket
do
    versions="$(aws s3api list-object-versions --bucket="$bucket" | jq .Versions)"
    delete_objects "$versions"

    markers="$(aws s3api list-object-versions --bucket="$bucket" | jq .DeleteMarkers)"
    delete_objects "$markers"
done

Improvements:

  • Passes shellcheck
  • Idiomatic Bash
  • Safety pragmas at the top
  • Reuses loop code
  • Uses More Quotes™
  • Simplified commands by using jq's --raw-output
  • Various ergonomics like a warning prompt, printing if no entries were found, escaping the command when printing it, and usage instructions
  • Processes multiple buckets

@andy-b-84
Copy link

Came up with that version, using headless commands & specifying region & profile :
https://gist.github.com/andy-b-84/9b9df3dc9ca8f7d50cd910b23cea5e0e

@kayomarz
Copy link

kayomarz commented Jul 4, 2021

This gist was very useful.

This error occurs when the aws command's default output format is not json:

parse error: Invalid numeric literal at line 2, column 0

This has a very simple fix:

Wherever aws command output is passed to jq, let the script specify --output=json.

For instance:

versions=`aws s3api list-object-versions --bucket $bucket |jq '.Versions'`

becomes

versions=`aws --output=json s3api list-object-versions --bucket $bucket |jq '.Versions'`

@l0b0
Copy link

l0b0 commented Jul 4, 2021

@kayomarz I think that might be a setting on your side - I don't need --output=json.

@kayomarz
Copy link

kayomarz commented Jul 5, 2021

@kayomarz I think that might be a setting on your side - I don't need --output=json.

@l0b0 Yes, my aws CLI is configured with output = table (aws CLI output is no longer json) and this script results in parse error: Invalid numeric literal at line 2, column 0.

Using --output=json mentioned above can be used to fix the error.

@justinTM
Copy link

justinTM commented Mar 7, 2022

you can use jq -r flag to remove quotation chars " from query results instead of sed btw

@davidwelborn
Copy link

for some bizarre reason, this line does not work for me:
version_id="$(jq --raw-output ".[${index}].VersionId" <<< "$1")"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment