Skip to content

Instantly share code, notes, and snippets.

@Foadsf
Last active November 20, 2024 08:54
Show Gist options
  • Save Foadsf/d92c09df0cace81b76be36a236428d8e to your computer and use it in GitHub Desktop.
Save Foadsf/d92c09df0cace81b76be36a236428d8e to your computer and use it in GitHub Desktop.
πŸ”„ Cross-platform PowerShell script for intelligent Git file move detection and history preservation (experimental - use at your own risk)

git-smart-move.ps1

A cross-platform PowerShell script that intelligently detects moved/renamed files in Git repositories and preserves their history using git-filter-repo.

⚠️ WARNING: This script is experimental and has not been thoroughly tested in production environments. Use at your own risk. Make sure to have backups before running it on your repository.

Requirements

  • PowerShell Core (Windows/Linux/macOS)
  • Git
  • git-filter-repo (pip install git-filter-repo)
#!/usr/bin/env pwsh
<#
.SYNOPSIS
Intelligently detects and handles Git file moves while preserving history.
.DESCRIPTION
This script analyzes a Git repository to find apparent file moves/renames,
verifies them through multiple methods, and uses git-filter-repo to
properly preserve their history.
.PARAMETER Path
The path to the Git repository. Defaults to current directory.
.PARAMETER MinSimilarity
Minimum similarity percentage to consider files as moved/renamed.
Default is 60 (same as Git's default).
.PARAMETER DryRun
If specified, shows what would be done without making changes.
.EXAMPLE
./git-smart-move.ps1 -Path ./my-repo
./git-smart-move.ps1 -MinSimilarity 80 -DryRun
#>
[CmdletBinding()]
param(
[Parameter()]
[string]$Path = ".",
[Parameter()]
[int]$MinSimilarity = 60,
[Parameter()]
[switch]$DryRun
)
# Class to represent a potential move
class FileMove {
[string]$OldPath
[string]$NewPath
[int]$Similarity
[string]$Method
[bool]$Confirmed
FileMove([string]$old, [string]$new, [int]$sim, [string]$method) {
$this.OldPath = $old
$this.NewPath = $new
$this.Similarity = $sim
$this.Method = $method
$this.Confirmed = $false
}
[string] ToString() {
return "$($this.OldPath) β†’ $($this.NewPath) ($($this.Similarity)% similar, detected by $($this.Method))"
}
}
# Function to verify Git repository
function Test-GitRepository {
param([string]$Path)
try {
Push-Location $Path
$gitDir = git rev-parse --git-dir 2>$null
$isRepo = $LASTEXITCODE -eq 0
Pop-Location
return $isRepo
}
catch {
return $false
}
}
# Function to check for git-filter-repo
function Test-GitFilterRepo {
try {
$null = git-filter-repo --version 2>$null
return $true
}
catch {
Write-Host "git-filter-repo not found. Please install it first:" -ForegroundColor Red
Write-Host "pip install git-filter-repo" -ForegroundColor Yellow
return $false
}
}
# Function to get deleted files
function Get-DeletedFiles {
$deletedFiles = @()
git ls-files --deleted 2>$null | ForEach-Object {
$deletedFiles += $_
}
return $deletedFiles
}
# Function to get untracked files
function Get-UntrackedFiles {
$untrackedFiles = @()
git ls-files --others --exclude-standard 2>$null | ForEach-Object {
$untrackedFiles += $_
}
return $untrackedFiles
}
# Function to find potential moves by filename
function Find-PotentialMovesByName {
param(
[string[]]$DeletedFiles,
[string[]]$UntrackedFiles
)
$moves = @()
foreach ($deleted in $DeletedFiles) {
$baseName = Split-Path -Leaf $deleted
$similar = $UntrackedFiles | Where-Object {
(Split-Path -Leaf $_) -eq $baseName
}
foreach ($match in $similar) {
$moves += [FileMove]::new($deleted, $match, 100, "exact_name_match")
}
}
return $moves
}
# Function to find potential moves by content similarity
function Find-PotentialMovesByContent {
param(
[string[]]$DeletedFiles,
[string[]]$UntrackedFiles,
[int]$MinSimilarity
)
$moves = @()
# Get the last content of deleted files
foreach ($deleted in $DeletedFiles) {
$oldContent = git show "HEAD:$deleted" 2>$null
if (-not $oldContent) { continue }
foreach ($untracked in $UntrackedFiles) {
if (-not (Test-Path $untracked)) { continue }
$newContent = Get-Content $untracked -Raw
# Calculate similarity using git's hash-object
$oldHash = $oldContent | git hash-object --stdin
$newHash = $newContent | git hash-object --stdin
if ($oldHash -eq $newHash) {
$moves += [FileMove]::new($deleted, $untracked, 100, "content_hash")
continue
}
# Use git's similarity index
$similarity = git diff --no-index --percentage $deleted $untracked 2>$null
if ($similarity -match "similarity index (\d+)%") {
$simValue = [int]$Matches[1]
if ($simValue -ge $MinSimilarity) {
$moves += [FileMove]::new($deleted, $untracked, $simValue, "content_similarity")
}
}
}
}
return $moves
}
# Function to generate filter-repo script
function New-FilterRepoScript {
param([FileMove[]]$Moves)
$scriptPath = Join-Path ([System.IO.Path]::GetTempPath()) "git-moves-$(New-Guid).py"
$script = @"
import fastimport.commands
def adjust_path(path):
path_str = path.decode('utf-8')
moves = {
$(foreach ($move in $Moves) {
" '$($move.OldPath)': '$($move.NewPath)',"
})
}
return moves.get(path_str, path_str).encode('utf-8')
def filter_commit(commit):
for change in commit.file_changes:
change.path = adjust_path(change.path)
if hasattr(change, 'new_path'):
change.new_path = adjust_path(change.new_path)
"@
$script | Out-File -FilePath $scriptPath -Encoding utf8
return $scriptPath
}
# Main execution
if (-not (Test-GitRepository $Path)) {
Write-Host "Error: Not a git repository: $Path" -ForegroundColor Red
exit 1
}
if (-not (Test-GitFilterRepo)) {
exit 1
}
Push-Location $Path
try {
# Get deleted and untracked files
$deletedFiles = Get-DeletedFiles
$untrackedFiles = Get-UntrackedFiles
if (-not $deletedFiles) {
Write-Host "No deleted files found in the repository." -ForegroundColor Yellow
exit 0
}
Write-Host "Analyzing potential file moves..." -ForegroundColor Cyan
# Find potential moves
$moves = @()
$moves += Find-PotentialMovesByName $deletedFiles $untrackedFiles
$moves += Find-PotentialMovesByContent $deletedFiles $untrackedFiles $MinSimilarity
if (-not $moves) {
Write-Host "No potential moves detected." -ForegroundColor Yellow
exit 0
}
# Group and sort moves by similarity
$moves = $moves | Sort-Object -Property Similarity -Descending
# Display findings
Write-Host "`nDetected potential moves:" -ForegroundColor Cyan
foreach ($move in $moves) {
Write-Host $move -ForegroundColor $(if ($move.Similarity -eq 100) { "Green" } else { "Yellow" })
}
if ($DryRun) {
Write-Host "`nDry run - no changes made." -ForegroundColor Yellow
exit 0
}
# Confirm moves
Write-Host "`nDo you want to proceed with these moves? (Y/N)" -ForegroundColor Cyan
$confirm = Read-Host
if ($confirm -notmatch '^[Yy]') {
Write-Host "Operation cancelled." -ForegroundColor Yellow
exit 0
}
# Generate and execute filter-repo script
$scriptPath = New-FilterRepoScript $moves
Write-Host "`nExecuting git-filter-repo..." -ForegroundColor Cyan
# Build path-rename arguments
$renameArgs = $moves | ForEach-Object { "--path-rename", "$($_.OldPath):$($_.NewPath)" }
# Execute git-filter-repo with path-rename arguments
git filter-repo --force $renameArgs
Write-Host "`nMoves completed successfully!" -ForegroundColor Green
Write-Host @"
Next steps:
1. Review the changes using 'git log' or 'git status'
2. If satisfied, commit any remaining changes
3. Push your changes
"@ -ForegroundColor Cyan
}
finally {
Pop-Location
if ($scriptPath -and (Test-Path $scriptPath)) {
Remove-Item $scriptPath
}
}

Test Report for git-smart-move.ps1

Test Environment

  • Windows 10/11
  • Command Prompt (cmd.exe)
  • Windows PowerShell
  • Git version control system
  • git-filter-repo (installed via pip)

Test Setup

Created a test repository with the following initial structure:

.
β”œβ”€β”€ README.txt
β”œβ”€β”€ docs/
β”‚   └── test.md
└── src/
    β”œβ”€β”€ main.js
    └── styles.css

Test Operations

Performed the following file moves:

  1. README.txt β†’ README.md
  2. src/main.js β†’ lib/js/main.js
  3. src/styles.css β†’ assets/css/main.css
  4. docs/test.md β†’ documentation/guide.md

Test Results

Successful Detections

The script correctly detected and preserved history for:

  1. README.txt β†’ README.md (100% match, content hash)
  2. src/main.js β†’ lib/js/main.js (100% match, detected by both content hash and exact name match)
  3. docs/test.md β†’ documentation/guide.md (100% match, content hash)

Issues Found

  1. Failed to detect: src/styles.css β†’ assets/css/main.css
    • Likely due to significant filename change (styles β†’ main)
    • File remained untracked after script execution

Command Line Usage

For Windows cmd.exe users, recommended to set up the following alias:

doskey ps=powershell.exe -ExecutionPolicy Bypass -Command "& $1 $2 $3 $4 $5"

Then use:

ps git-smart-move.ps1 -Path . [-DryRun]

Known Limitations

  1. Struggles with detecting moves when both path and filename change significantly
  2. May need adjustment of similarity threshold for better detection
  3. Current implementation focuses on exact content matches
  4. Requires manual handling of undetected moves
  5. PowerShell execution policy might need adjustment on some systems
  6. Original script required modification to work with git-filter-repo (path-rename vs python-callback)

Future Improvements

  1. Enhanced detection for files with similar content but different names
  2. Better handling of CSS and other text files where content might have minor variations
  3. Configuration options for similarity thresholds
  4. Support for complex rename patterns
  5. Better error handling for git-filter-repo operations

Notes

  • Always run with -DryRun first to preview detected moves
  • Make sure to have git-filter-repo installed via pip
  • Backup your repository before running the script
  • Use with caution on shared repositories as it modifies history
@Foadsf
Copy link
Author

Foadsf commented Nov 20, 2024

relevant discussions here and here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment