Skip to content

Instantly share code, notes, and snippets.

@weavenet
Created May 4, 2015 05:21
Show Gist options
  • Save weavenet/f40b09847ac17dd99d16 to your computer and use it in GitHub Desktop.
Save weavenet/f40b09847ac17dd99d16 to your computer and use it in GitHub Desktop.
Delete all versions of all files in s3 versioned bucket using AWS CLI and jq.
#!/bin/bash
bucket=$1
set -e
echo "Removing all versions from $bucket"
versions=`aws s3api list-object-versions --bucket $bucket |jq '.Versions'`
markers=`aws s3api list-object-versions --bucket $bucket |jq '.DeleteMarkers'`
let count=`echo $versions |jq 'length'`-1
if [ $count -gt -1 ]; then
echo "removing files"
for i in $(seq 0 $count); do
key=`echo $versions | jq .[$i].Key |sed -e 's/\"//g'`
versionId=`echo $versions | jq .[$i].VersionId |sed -e 's/\"//g'`
cmd="aws s3api delete-object --bucket $bucket --key $key --version-id $versionId"
echo $cmd
$cmd
done
fi
let count=`echo $markers |jq 'length'`-1
if [ $count -gt -1 ]; then
echo "removing delete markers"
for i in $(seq 0 $count); do
key=`echo $markers | jq .[$i].Key |sed -e 's/\"//g'`
versionId=`echo $markers | jq .[$i].VersionId |sed -e 's/\"//g'`
cmd="aws s3api delete-object --bucket $bucket --key $key --version-id $versionId"
echo $cmd
$cmd
done
fi
@cbrinker
Copy link

cbrinker commented Sep 5, 2017

The script was generally well behaved, however key's with spaces caused some troubles. Thanks for sharing!

@jeroenbaas
Copy link

jeroenbaas commented Oct 2, 2017

This can be done much more efficiently by making use of the --query parameter:
aws s3api list-object-versions --bucket $bucket --prefix $somePrefixToFilterByIfYouNeedTo --query "[Versions,DeleteMarkers][].{Key: Key, VersionId: VersionId}"
after which you can just loop over the results in one go.
I was looking for something like this to solve a slow bucket with millions of deleted keys that can potentially speed the bucket up, but above code would sit for days acquiring hugely overheaded json (twice!)

example of the entire code (including a prefix in case you only want to clear a subset of a bucket), using --output text to loop over the results in text mode (even less overhead).

#!/bin/bash

bucket=$1
prefix=$2
set -e

echo "Removing all versions from $bucket, prefix $prefix"

OIFS="$IFS" ; IFS=$'\n' ; oset="$-" ; set -f
while IFS="$OIFS" read -a line 
do 
    key=`echo ${line[0]} | sed 's#SPACEREPLACE# #g'` # replace the TEMPTEXT by space again (needed to temp replace because of split by all spaces by read -a above)
    versionId=${line[1]}
    echo "key: ${key} versionId: ${versionId}"
    # use doublequotes (escaped) around the key to allow for spaces in the key.
    cmd="/usr/bin/aws s3api delete-object --bucket $bucket --key \"$key\" --version-id $versionId"
    echo $cmd
    eval $cmd
done < <(aws s3api list-object-versions --bucket $bucket --prefix $prefix --query "[Versions,DeleteMarkers][].{Key: Key, VersionId: VersionId}" --output text | sed 's# #SPACEREPLACE#g' )

@jnawk
Copy link

jnawk commented Oct 26, 2017

AWS CLI requires python, and there's a much much better way to do this using python:

import boto3
session = boto3.session()
s3 = session.resource(service_name='s3')
bucket = s3.Bucket('your_bucket_name')
bucket.object_versions.delete()
# bucket.delete()

@nicdoye
Copy link

nicdoye commented Oct 30, 2017

@jnawk Nice! Minor typo: it should be boto3.Session() not boto3.session().

@mattbryson
Copy link

mattbryson commented Nov 9, 2017

@jnawk Thats awesome! Saved me so much time. 2 quick questions...

  1. object_versions.delete() doesn't appear to remove zero byte keys, any idea how to get round that?
  2. is there a way to enable verbose output on boto3? .delete() can take a long time on large buckets, would be good to see some sort of progress...

UPDATE: its not Zero byte keys, its Mac OS "Icon?" files, when uploaded to S3, a newline gets appended to the file name, which stuffs all the S3 tooling, even the console. Have raised this with AWS.

@joelthompson
Copy link

There's a small bug when you have only a single object and/or delete marker. Basically, with this:

let count=`echo $versions |jq 'length'`-1

For some reason, if count is 0, bash counts that as an error, and because you do a set -e above, this causes the script to fail out.

@JohnVonNeumann
Copy link

You are a champion amongst men. Cheers.

@arichiardi
Copy link

arichiardi commented May 9, 2018

Thanks for this script, I tweaked the original one in order to urldecode UTF-8 keys coming from the bucket:

#!/bin/bash

bucket=$1

set -e

echo "Removing all versions from $bucket"

function urldecode {
    echo $(python -c "import sys, urllib as ul; print ul.unquote_plus(sys.argv[1])" $1);
}

versions=`aws s3api list-object-versions --encoding-type url --bucket $bucket | jq '.Versions'`
markers=`aws s3api list-object-versions --encoding-type url --bucket $bucket | jq '.DeleteMarkers'`

echo "removing files"
for version in $(echo "${versions}" | jq -r '.[] | @base64'); do
    version=$(echo ${version} | base64 --decode)

    key=`echo $version | jq -r .Key`
    versionId=`echo $version | jq -r .VersionId`
    decodedVersionId=$(urldecode "$key")
    cmd="aws s3api delete-object --bucket $bucket --key $decodedVersionId --version-id $versionId"
    echo $cmd
    $cmd
done

echo "removing delete markers"
for marker in $(echo "${markers}" | jq -r '.[] | @base64'); do
    marker=$(echo ${marker} | base64 --decode)

    key=`echo $marker | jq -r .Key`
    versionId=`echo $marker | jq -r .VersionId`
    decodedVersionId=$(urldecode "$key")
    cmd="aws s3api delete-object --bucket $bucket --key $decodedVersionId --version-id $versionId"
    echo $cmd
    $cmd
done

@tokozedg
Copy link

bucket.object_versions.filter(
        Prefix='folder'
).delete()

This worked for me very well.

@ip1981
Copy link

ip1981 commented Sep 24, 2018

Guys, there is set -x for this:

cmd="aws s3api delete-object --bucket $bucket --key $key --version-id $versionId"
echo $cmd

@ip1981
Copy link

ip1981 commented Sep 24, 2018

And you can use Jq to build up command lines:

... | jq -r '.Versions[] | "aws s3api delete-object --bucket capitalmatch-backups --key \"\(.Key)\" --version-id \"\(.VersionId)\""'

@felipekiko
Copy link

Works for me with some encoding filenames in version files. Thanks!!

@wknapik
Copy link

wknapik commented Mar 11, 2019

@kaosinc
Copy link

kaosinc commented Feb 16, 2020

Thank you SO MUCH!

@nashjain
Copy link

nashjain commented Apr 8, 2020

There is actually a much simpler and faster approach:

bucket=$1
fileToDelete=$2
deleteBefore=$3
fileName='aws_delete.json'
rm $fileName
versionsToDelete=`aws s3api list-object-versions --bucket "$bucket" --prefix "$fileToDelete" --query "Versions[?(LastModified<'$deleteBefore')].{Key: Key, VersionId: VersionId}"`
cat << EOF > $fileName
{"Objects":$versionsToDelete, "Quiet":true}
EOF
aws s3api delete-objects --bucket "$bucket" --delete file://$fileName

s3api delete-objects can handle up to 1000 records.

Want to do more advance stuff? Check out my gist.

@RahulAdepu92
Copy link

Note: Until and unless you don't have "s3:DeleteObjectVersion" included in policy under IAM role, all version deletion wont be working.

@marcuspaget
Copy link

Thanks @nashjain ... here is my version off yours :)

(echo -n '{"Objects":';aws s3api list-object-versions --bucket "$bucket" --prefix "$prefix" --max-items 1000 --query "Versions[?(LastModified<'2020-07-21')].{Key: Key, VersionId: VersionId}" | sed 's#]$#] , "Quiet":true}#') > _TMP_DELETE && aws s3api delete-objects --bucket "$bucket" --delete file://_TMP_DELETE

To do 1000 at a time.

@marcuspaget
Copy link

marcuspaget commented Jul 29, 2020

Found I could put in a loop and get through about 3 iterations (or 3k objects a minute). So produced this script which downloads 10k objects, then uses jq to slice 1k at a time and deletes, looping 4k times. Now up to around 4.5k objects a minute.

bucket=_BUCKET_NAME_
prefix=_PREFIX_

cnt=0
FN=/tmp/_TMP_DELETE
rm $FN 2> /dev/null

while [ $cnt -lt 4000 ]
do
	aws s3api list-object-versions --bucket "$bucket" --prefix "$prefix" --max-items 10000 --query "Versions[?(LastModified<'2019-07-21')].{Key: Key, VersionId: VersionId}" > $FN
	rm $FN.upload 2> /dev/null
	s=0
	while [ $s -lt 9999 ]
	do
		((e=s+999))
		#echo taking $s to $e
		(echo -n '{"Objects":';jq ".[$s:$e]" < $FN 2>&1 | sed 's#]$#] , "Quiet":true}#') > $FN.upload
		aws s3api delete-objects --bucket "$bucket" --delete file://$FN.upload && rm $FN.upload
		((s=e+1))
		#echo s is $s and e is $e
		echo -n "."
	done

((cnt++))
((tot=cnt*10))
echo on run $cnt total deleted ${tot}k objects

done

@marcuspaget
Copy link

Okay ... faster still (~10k/min) - just dump all in the file then:

bucket=_BUCKET_
prefix=_PREFIX_
SRCFN=_DUMP_FILE_
FN=/tmp/_TMP_DELETE

aws s3api list-object-versions --bucket "$bucket" --prefix "$prefix" --query "Versions[?(LastModified<'2019-07-21')].{Key: Key, VersionId: VersionId}" > $SRCFN

rm $FN 2> /dev/null
s=0
c=`grep -c VersionId $SRCFN`

while [ $s -lt $c ]
do
	((e=s+999))
	echo taking $s to $e
	(echo -n '{"Objects":';jq ".[$s:$e]" < $SRCFN 2>&1 | sed 's#]$#] , "Quiet":true}#') > $FN
	aws s3api delete-objects --bucket "$bucket" --delete file://$FN && rm $FN
	((s=e+1))
	sleep 1
	#echo s is $s and e is $e
	#echo -n "."
done

@git-hemant
Copy link

git-hemant commented Aug 31, 2020

Yet another minor update to fix the issue when the key (file name) contain spaces

`#!/bin/bash

bucket=$1

set -e

echo "Removing all versions from $bucket"

versions=aws s3api list-object-versions --bucket $bucket |jq '.Versions'
markers=aws s3api list-object-versions --bucket $bucket |jq '.DeleteMarkers'
let count=echo $versions |jq 'length'-1

if [ $count -gt -1 ]; then
echo "removing files"
for i in $(seq 0 $count); do
key=echo $versions | jq .[$i].Key |sed -e 's/\"//g'
versionId=echo $versions | jq .[$i].VersionId |sed -e 's/\"//g'
cmd="aws s3api delete-object --bucket $bucket --key "$key" --version-id $versionId"
echo $cmd
eval $cmd
done
fi

let count=echo $markers |jq 'length'-1

if [ $count -gt -1 ]; then
echo "removing delete markers"

    for i in $(seq 0 $count); do
            key=`echo $markers | jq .[$i].Key |sed -e 's/\"//g'`
            versionId=`echo $markers | jq .[$i].VersionId |sed -e 's/\"//g'`
            cmd="aws s3api delete-object --bucket $bucket --key \"$key\" --version-id $versionId"
            echo $cmd
            eval $cmd
    done

fi`

@morufajibikekpmg
Copy link

morufajibikekpmg commented Dec 22, 2020

AWS CLI requires python, and there's a much much better way to do this using python:

import boto3
session = boto3.session()
s3 = session.resource(service_name='s3')
bucket = s3.Bucket('your_bucket_name')
bucket.object_versions.delete()
# bucket.delete()

This could be, if you want to use a named profile:

import boto3
session = boto3.session.Session(profile_name='your_profile_name')
s3 = session.resource(service_name='s3')
bucket = s3.Bucket('your_bucket_name')

## uncomment the line below to delete your bucket objects versions; BE CAREFUL!!!
# bucket.object_versions.delete()

## uncomment the line below to delete your bucket; BE CAREFUL!!!
# bucket.delete()

@forzagreen
Copy link

With the AWS CLI v2, by default it returns all output through a pager program (e.g. less). Cf. Output paging.
To disable it, run:

export AWS_PAGER=""

@l0b0
Copy link

l0b0 commented Mar 15, 2021

Another version:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail

if [[ "$#" -eq 0 ]]
then
    cat >&2 << 'EOF'
./clear-s3-buckets.bash BUCKET [BUCKET…]

Deletes *all* versions of *all* files in *all* given buckets. Only to be used in case of emergency!
EOF
    exit 1
fi

read -n1 -p "THIS WILL DELETE EVERYTHING IN BUCKETS ${*}! Press Ctrl-c to cancel or anything else to continue: " -r

delete_objects() {
    count="$(jq length <<< "$1")"

    if [[ "$count" -eq 0 ]]
    then
        echo "No objects found; skipping" >&2
        return
    fi

    echo "Removing objects"
    for index in $(seq 0 $(("$count" - 1)))
    do
        key="$(jq --raw-output ".[${index}].Key" <<< "$1")"
        version_id="$(jq --raw-output ".[${index}].VersionId" <<< "$1")"
        delete_command=(aws s3api delete-object --bucket="$bucket" --key="$key" --version-id="$version_id")
        printf '%q ' "${delete_command[@]}"
        printf '\n'
        "${delete_command[@]}"
    done
}

for bucket
do
    versions="$(aws s3api list-object-versions --bucket="$bucket" | jq .Versions)"
    delete_objects "$versions"

    markers="$(aws s3api list-object-versions --bucket="$bucket" | jq .DeleteMarkers)"
    delete_objects "$markers"
done

Improvements:

  • Passes shellcheck
  • Idiomatic Bash
  • Safety pragmas at the top
  • Reuses loop code
  • Uses More Quotes™
  • Simplified commands by using jq's --raw-output
  • Various ergonomics like a warning prompt, printing if no entries were found, escaping the command when printing it, and usage instructions
  • Processes multiple buckets

@andy-b-84
Copy link

Came up with that version, using headless commands & specifying region & profile :
https://gist.github.com/andy-b-84/9b9df3dc9ca8f7d50cd910b23cea5e0e

@kayomarz
Copy link

kayomarz commented Jul 4, 2021

This gist was very useful.

This error occurs when the aws command's default output format is not json:

parse error: Invalid numeric literal at line 2, column 0

This has a very simple fix:

Wherever aws command output is passed to jq, let the script specify --output=json.

For instance:

versions=`aws s3api list-object-versions --bucket $bucket |jq '.Versions'`

becomes

versions=`aws --output=json s3api list-object-versions --bucket $bucket |jq '.Versions'`

@l0b0
Copy link

l0b0 commented Jul 4, 2021

@kayomarz I think that might be a setting on your side - I don't need --output=json.

@kayomarz
Copy link

kayomarz commented Jul 5, 2021

@kayomarz I think that might be a setting on your side - I don't need --output=json.

@l0b0 Yes, my aws CLI is configured with output = table (aws CLI output is no longer json) and this script results in parse error: Invalid numeric literal at line 2, column 0.

Using --output=json mentioned above can be used to fix the error.

@justinTM
Copy link

justinTM commented Mar 7, 2022

you can use jq -r flag to remove quotation chars " from query results instead of sed btw

@davidwelborn
Copy link

for some bizarre reason, this line does not work for me:
version_id="$(jq --raw-output ".[${index}].VersionId" <<< "$1")"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment