Comparing two folder structures with PowerShell

Recently, I encountered a situation while working with a Google Workspace Data Export where I faced the challenge of missing files after generating, downloading, and exporting the data. Troubleshooting this issue can be difficult, especially considering the folder structure and the number of files involved.

Complicating matters further, I was working on a Windows system that didn’t allow the use of third-party tools. In order to overcome this obstacle, I decided to create a short PowerShell script to assist me in resolving the problem.

My objective was to develop a script capable of comparing two folder structures and providing the complete file paths for any differing files. One folder structure, referred to as Folder A, was located on the virtual drive provided by the Drive for desktop app. The other structure, Folder B, was locally stored after exporting the Data Export.

However, there was an additional complication due to the different file extensions used for native Google files and the downloaded files from the Data Export. For instance, Google Sheets files have the extension .gsheet, but they are converted to MS Excel files which end with .xlsx upon downloading.

To enhance the versatility of the script, I aimed to include an option to ignore file extensions if necessary.

In summary, the requirements for the PowerShell script were as follows:

  • Compare two folders
  • Provide the ability to ignore file extensions
  • Output the full path of the affected files

Script

# Prompt the user to enter the paths for Folder A and Folder B
$folder1 = Read-Host "Enter the path for Folder A"
$folder2 = Read-Host "Enter the path for Folder B"

# Prompt for whether to ignore file extensions
$ignoreExtensions = 'y' -eq (Read-Host 'Ignore file extensions? (y/n)')

# Get the files in each folder and store their relative and full paths in arrays
$dir1Dirs, $dir2Dirs = $folder1, $folder2 | 
  ForEach-Object {
    $fullRootPath = Convert-Path -LiteralPath $_

    # Construct an array of custom objects for the folder tree
    , @(
      Get-ChildItem -Recurse -LiteralPath $fullRootPath |
        ForEach-Object {
          $relativePath = $_.FullName.Substring($fullRootPath.Length + 1)
          if ($ignoreExtensions) { $relativePath = $relativePath -replace '\.[^.]*$' }
          [PSCustomObject] @{
            RelativePath = $relativePath
            FullName = $_.FullName
          }
        }
    )
  }

# Compare the two arrays based on the RelativePath property
$comparison = Compare-Object -Property RelativePath -PassThru $dir2Dirs $dir1Dirs

# Group the comparison results by SideIndicator
$groupedComparison = $comparison | Group-Object -Property SideIndicator

# Output an empty line before the grouped results
Write-Host

# Iterate over each group in the groupedComparison and output the results
$groupedComparison | ForEach-Object {
    $sideIndicator = $_.Name

    # Determine the text to display based on the side indicator
    $sideIndicatorText = if ($sideIndicator -eq '=>') { 'Only in Folder A' } else { 'Only in Folder B' }

    # Output the side indicator text and a separator
    Write-Host "$sideIndicatorText"
    Write-Host "-----------------------------"

    # Iterate over each item in the group and output the full name
    $_.Group | ForEach-Object {
        Write-Host $_.FullName
    }

    # Output an empty line after each group
    Write-Host
}

# Output a message if no differences are found
if (-not $comparison) {
    Write-Host "No differences found between the two folders."
}

Execution Policy

PowerShell has an execution policy that determines which scripts are allowed to run. If the execution policy is set to a restrictive level, it can prevent the execution of scripts. You can check the execution policy by running the command Get-ExecutionPolicy. If it is set to Restricted, AllSigned, or RemoteSigned, you may need to change it to a less restrictive policy using the Set-ExecutionPolicy command. For example:

Set-ExecutionPolicy RemoteSigned to allow locally created scripts to run.

or

Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process -Force the execution policy will be temporarily disabled to run scripts only in the current PowerShell session.

Final Note

The primary objective of this script is to easily validate a data export. Upon closer examination, it becomes clear that there are numerous variables that may differ in the export without indicating any problem. The file extension, file size, and even the file name can undergo changes as a result of the export and conversion. For instance, special characters are eliminated from the file name, and if multiple files share the same name, a number is appended at the end to ensure uniqueness.

In short, the main purpose is to offer an indication of whether everything went well or further investigation is required.

Update: Special thanks to “Giselle Valladares” from Stack Overflow for refining and troubleshooting the updated version of this script.

asterix Written by:

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *