Skip to content

Instantly share code, notes, and snippets.

@tegansnyder
Created October 17, 2017 20:34
Show Gist options
  • Save tegansnyder/2252a8c190a91df18491421834d06df5 to your computer and use it in GitHub Desktop.
Save tegansnyder/2252a8c190a91df18491421834d06df5 to your computer and use it in GitHub Desktop.
HDFS cron to delete old files in directory
#!/bin/bash
now=$(date +%s);
days_to_keep=5
# Loop through files
hdfs dfs -ls /some_hdfs_directory | while read f; do
# Get File Date and File Name
file_date=`echo $f | awk '{print $6}'`;
file_name=`echo $f | awk '{print $8}'`;
echo $file_date
echo $file_name
# Calculate Days Difference
difference=$(( ($now - $(date -d "$file_date" +%s)) / (24 * 60 * 60) ));
if [ $difference -gt $days_to_keep ]; then
echo "Deleting $file_name it is older than $days_to_keep and is dated $file_date.";
hdfs dfs -rm $file_name
fi
done
@mishmam3
Copy link

Thank you, this is very helpful 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment