Analyze a Google Drive folder structure via Python

This script is designed to calculate and report key metrics for a specified folder. It recursively examines the contents of the specified folder, determining the total size of all files within it, the count of files, the count of subfolders, and the maximum depth of nested subfolders.

These metrics can be useful to organize and optimize your folder structures or can be essential before moving a folder structure into a shared drive, as they come with some limitations (a maximum of 400,000 items & up to 20 levels of nested folders).

Prerequisites

To run the script successfully, you will need the following prerequisites:

  1. Create a new Google Project
  2. Enable the Drive API
  3. Create an OAuth consent screen
  4. Create OAuth 2.0 Client IDs – Desktop App
  5. Download the JSON file
  6. Setup Python
    $ sudo apt install python3-pip
    $ pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Further information can be found in the Python quickstart guide from Google.

In order to run the script, simply rename the JSON file you’ve downloaded to “credentials.json“, position it within the script’s directory. To retrieve information from a specific directory, the “folder_id” needs to be specified. The default setting for this variable is “root.”

Script

import os.path
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# Define the Google Drive API scopes for accessing files and folders
# If modifying these scopes, delete the file token.json
SCOPES = ['https://www.googleapis.com/auth/drive']
TOKEN_PATH = 'token.json'
CREDENTIALS_PATH = 'credentials.json'

def get_credentials():
    # Gets user credentials or prompts for authorization if needed
    # Try to load existing credentials from 'token.json'
    creds = Credentials.from_authorized_user_file(TOKEN_PATH, SCOPES) if os.path.exists(TOKEN_PATH) else None

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            # Refresh the credentials if they are expired and can be refreshed
            creds.refresh(Request())
        else:
            # If no valid credentials exist, initiate the OAuth flow to obtain them
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_PATH, SCOPES)
            creds = flow.run_local_server(port=0)

        # Save the obtained credentials for future runs
        with open(TOKEN_PATH, 'w') as token_file:
            token_file.write(creds.to_json())

    return creds

def get_folder_size_and_file_count(drive_service, folder_id, total_size=0, file_count=0, folder_count=0, current_depth=0, max_depth=0):
    try:
        # List all files and folders in the specified folder
        results = drive_service.files().list(
            pageSize=1000,  # The number of files returned in each API request
            q=f"'{folder_id}' in parents",
            fields="nextPageToken, files(id,mimeType,size)",
            includeItemsFromAllDrives=True,
            supportsAllDrives=True
        ).execute()
        items = results.get('files', [])

        for item in items:
            if item['mimeType'] == 'application/vnd.google-apps.folder':
                # If it's a subfolder, calculate its size and contents recursively
                total_size, file_count, folder_count, current_depth, max_depth = get_folder_size_and_file_count(
                    drive_service, item['id'], total_size, file_count, folder_count, current_depth + 1, max_depth)
                folder_count += 1
            elif 'size' in item:
                # If the 'size' key exists, add its size to the total
                total_size += int(item['size'])
                file_count += 1
            else:
                # If there is no 'size' key, increment the file count without adding to the total size
                file_count += 1

        # Update the maximum folder depth
        max_depth = max(max_depth, current_depth)

    except Exception as e:
        print(f"An error occurred: {str(e)}")

    return total_size, file_count, folder_count, current_depth - 1, max_depth

def print_folder_size_and_file_count(size_bytes, file_count, folder_count, max_depth):
    # Convert the size to KB, MB, or GB for readability
    size_kb = size_bytes / 1024
    size_mb = size_kb / 1024
    size_gb = size_mb / 1024

    if size_gb >= 1:
        print(f"Total Folder Size: {size_gb:.2f} GB")
    elif size_mb >= 1:
        print(f"Total Folder Size: {size_mb:.2f} MB")
    else:
        print(f"Total Folder Size: {size_kb:.2f} KB")

    print(f"Total File Count: {file_count}")
    print(f"Total Folder Count: {folder_count}")
    print(f"Max Folder Depth: {max_depth}")

def main():
    # Main function to list files and folders recursively from the root folder
    # Get user credentials for accessing Google Drive
    creds = get_credentials()
    service = build('drive', 'v3', credentials=creds)

    # Set the folder ID to 'root' to check the root folder or specify a different folder ID
    folder_id = 'root'
    
    # Calculate folder size, file count, folder count, and max folder depth
    size_bytes, file_count, folder_count, _, max_depth = get_folder_size_and_file_count(service, folder_id)
    
    # Print the calculated information
    print_folder_size_and_file_count(size_bytes, file_count, folder_count, max_depth)

if __name__ == '__main__':
    main()
asterix Written by:

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *