SharePoint automation (v1)

The SharePointClient class provides a convenient Python interface for interacting with SharePoint sites:

Seamlessly upload, download, and manage files and folders.
Read Excel or CSV data into pandas DataFrames.
Perform bulk operations like recursive folder traversal.
Handle large file sizes with chunked uploads.
Leverage built-in retry logic for network reliability.

Initialization

from shared.sharepoint.client import SharePointClient

# Create a client for the "Stoneware" SharePoint site.
client = SharePointClient("Stoneware")

Parameters:

site (str | None): The name of the SharePoint site. If None, defaults to the root site.
max_retries (int, optional): Maximum number of retry attempts for throttled or failed operations. Defaults to 3.

Notes:

Creating a SharePointclient instance for a site requires an authentication request. You should only do it when you need the client and never in the global scope. Otherwise, every time your code is imported will trigger a network request.
When you call SharePointClient(site) multiple times within the same thread, you get the same client instance. This makes it easier to deal with the requirement above, as repeated creation calls for the same site use a single session and do not repeat authentication calls.

Using the client

Most methods expect to receive a path. It can be provided as a string ("Folder/Subfolder/filename.ext") or a list of strings (["Folder", "Subfolder", "filename.ext"]), the client supports both.

Also, most methods are available on both the client and its associated File and Folder objects, which can be obtained by using get_file(path) and get_folder(path). You can use one or the other, they're generally equivalent. For example client.read_file(path) does client.get_file(path).read() under the hood.

The File and Folder objects are helpful to access metadata or for more advanced use cases. They store a reference to their client internally, so you can mix files and folders from different sites without a problem.

Retrieving Files and Folders

get_file(path: str | list[str]) → File
Returns a File object representing a file at the given path.
get_folder(path: str | list[str]) → Folder
Returns a Folder object representing a folder at the given path.

Checking Existence

file_exists(path: str | list[str]) → bool
Checks if a file exists at the given path.
folder_exists(path: str | list[str]) → bool
Checks if a folder exists at the given path.

Listing Files and Folders

list_files(path: str | list[str]) → list[str]
Returns a list of filenames in the specified folder.
list_folders(path: str | list[str]) → list[str]
Returns a list of subfolder names in the specified folder.

These two methods return a list of strings for historical compatibility. For more advanced listing capabilities, use get_folder(path) to get a Folder object, which provides:

list_files(recursive: bool = False) → list[File]
Lists all files in the folder, optionally including files in subfolders recursively. Recursive calls for large folder trees can take multiple minutes.
list_folders(recursive: bool = False) → list[Folder]
Lists all subfolders, optionally including nested folders recursively. Recursive calls for large folder trees can take multiple minutes.

Uploading and Downloading

upload_file(path: str | list[str], file: str | Path | bytes | io.BytesIO, chunksize: int = int(3.5e6))
Uploads a file (bytes, io.BytesIO, or local file path) to the specified location on SharePoint. Automatically uses chunked uploads for large files (>3.5 MB).
download_file(path: str | list[str], out: str | Path)
Downloads a file from SharePoint to your local filesystem.

Reading into a DataFrame

read_file(path: str | list[str]) → bytes
Reads the binary contents of a file from SharePoint.
read_dataframe(path: str | list[str], sheet_name: int | str = 0, **kwargs) → pandas.DataFrame
Reads an Excel file from SharePoint into a pandas DataFrame.

Folder and File Management

create_folder(path: str | list[str]) → Folder
Ensures the specified folder path exists on SharePoint (creating intermediate folders if necessary) and returns the resulting Folder object.
move_folder(current_path: str | list[str], new_path: str | list[str])
Moves a folder from current_path to new_path.
rename_file(path: str | list[str], new_name: str)
Renames a file at the given path.
move_file(path: str | list[str], new_folder: str | list[str], overwrite: bool = False)
Moves a file to a new folder, optionally overwriting an existing file.
delete_file(path: str | list[str])
Deletes a file at the given path.

Shapefile Support

load_shapefile(path: str | list[str]) → geopandas.GeoDataFrame
Locates and downloads all mandatory Shapefile components (e.g., .shp, .dbf, .shx) from a folder, then reads them into a GeoDataFrame.

Example Usage

from shared.sharepoint.client import SharePointClient

# Instantiate a SharePoint client
client = SharePointClient("AnalyticalData")

# 1. Creating a new folder
folder = client.create_folder("new-data-folder")

# 2. Uploading a local file
local_file_path = "./data/sample.xlsx"
client.upload_file(["new-data-folder", "sample.xlsx"], local_file_path)

# 3. Reading that file into a DataFrame
df = client.read_dataframe(["new-data-folder", "sample.xlsx"], sheet_name=0)

# 4. Downloading the file
client.download_file(["new-data-folder", "sample.xlsx"], "./downloaded_sample.xlsx")

# 5. Checking if the file exists
if client.file_exists(["new-data-folder", "sample.xlsx"]):
    print("File successfully uploaded!")

# 6. Moving the file
client.move_file(
    ["new-data-folder", "sample.xlsx"],
    ["archived-data", "sample_archived.xlsx"],
    overwrite=True
)

# 7. Deleting the old folder
client.get_folder("new-data-folder").delete()

Error Handling and Retries

All SharePoint operations invoke an internal _execute_query() method that uses incremental retry logic for up to max_retries. If throttling or network errors occur, the client automatically waits and retries with exponential backoff.

You can control the default number of retries using the max_retries argument:

client = SharePointClient("AnalyticalData", max_retries=5)

V2

Overview

SharePointClientV2 is designed to simplify reading, writing, and managing files/folders within a SharePoint Document Library, as well as interacting with Excel workbooks via Microsoft Graph. The library supports:

Authentication and token management for Microsoft Graph
Throttling, retry, and monitoring of long-running operations
Handling files and folders (SharePointFile, SharePointFolder)
Running queries and uploads/downloads for large files
Creating and managing workbook sessions to read and update Excel files in SharePoint

Most methods use asyncio, providing concurrency-friendly operations.

Key Classes

SharePointClientV2

This is the main entry point for SharePoint interactions.

Key Responsibilities:

Managing OAuth2 tokens
Providing methods to list and create sites, files, folders
Handling uploads and downloads (chunked for large files)
Managing workbook sessions
Caching references to site/directory info
Managing concurrency through built-in rate limits, retries, and backoff

Highlighted Methods:

db_client(pg_url=PGURL) -> SharePointDB
- Returns a reference to a database-backed helper (SharePointDB) that can synchronize items to a Postgres DB.
- Helpful if you want to track updates from the Graph API.
get_file(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFile | None
- Retrieves a file by path, returning a SharePointFile.
- Accepts an optional site to choose which SharePoint Site to use, otherwise uses the default.
- If allow_not_found=True, returns None if the file is not found.
get_folder(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFolder | None
- Retrieves a folder by path, returning a SharePointFolder.
- If allow_not_found=True, returns None if the folder is not found.
ensure_folder(path: str | list[str], site: str | None = None) -> SharePointFolder
- If the folder does not exist, creates it. Otherwise returns the existing folder.
item_exists(path: str | list[str], site: str | None = None) -> bool
- Checks if a file or folder exists at the given path.
list_sites()
- Lists all available SharePoint sites.
_create_link(...)
- Internal method to create a sharing link. Typically used through SharePointFile.create_link(...) or SharePointFolder.create_link(...).
_read_file(file_id: str, site: str) -> bytes
- Internal method that fetches the file’s binary contents.
_upload_file(folder_id: str, name: str, file: Path|bytes, ...) -> None
- Internal method to handle small or chunked file uploads.
close()

Closes the underlying async session.

Initialization:

from shared.sharepoint.client_v2 import SharePointClientV2

async with SharePointClientV2(default_site="MySiteName") as sp:
    # sp is ready for usage
    # e.g. file = await sp.get_file(["folder", "subfolder", "myfile.docx"])
    pass

SharePointFile

Represents a file in SharePoint.

Key Attributes:

id: Unique ID of the file in SharePoint
name: Name of the file (filename)
path: Path of the file, e.g. /Some/Folder/filename.docx
url: SharePoint web URL for the file
size: File size in bytes
deleted: Boolean indicating if the file has been marked deleted in SharePoint

Key Methods:

read() -> bytes
- Reads the file content into memory.
download(out: str | Path) -> None
- Downloads the file locally to the specified path.
rename(name: str | None, description: str | None = None) -> SharePointFile
- Renames the file (and optionally sets a description).
move(path: str | list[str], new_name: str | None = None)
- Moves the file to a different folder path and optionally renames it.
copy(path: str | list[str], new_name: str | None = None)
- Copies the file to a different path, optionally changing its name.
delete()
- Permanently deletes the file.
create_workbook_session(persist_changes=True, auto_refresh=False) -> WorkbookSession
- Initiates a workbook session if the file is an Excel workbook.
- persist_changes means changes persist once the session closes.
- auto_refresh automatically refreshes the session token.

SharePointFolder

Represents a folder in SharePoint.

Key Attributes:

id: Unique folder ID
name: Folder name
path: Full path for the folder, e.g. /Documents/Reports
children: Number of immediate children in the folder
deleted: Boolean indicating if the folder is marked as deleted

Key Methods:

list_children() -> (List[SharePointFile], List[SharePointFolder])
- Lists all child items in the folder.
list_files() -> List[SharePointFile] and list_folders() -> List[SharePointFolder]
- Helper methods to list files only or folders only.
upload_file(name: str, file: str | Path | bytes, chunksize=50_000_000)
- Uploads a new file to the folder; large files are chunked by default.
delete_child(name: str) -> None
- Deletes a file or folder by name.
ensure_folder(path: str | list[str]) -> SharePointFolder
- Creates subfolders as needed.
rename(name: str | None, description: str | None = None)
- Renames the folder.
move(...), copy(...), delete(...)
- Similar to SharePointFile operations.

WorkbookSession

Represents an active session for editing an Excel workbook stored in SharePoint (via the Graph API). Created via SharePointFile.create_workbook_session(...).

Key Methods:

list_worksheets() -> List[Worksheet]
- Returns all worksheets within the workbook.
get_worksheet(name_or_id: str) -> Worksheet
- Retrieves a worksheet by its name or ID.
close()
- Closes the session. If persist_changes=True, changes are committed.
batch_request(requests: List[dict]) -> List[dict]
- Sends multiple Excel Graph operations in a single batch call.
refresh_session()
- Extends the session lifetime to avoid expiration.
__aenter__ / __aexit__
- Allows async with WorkbookSession(...) usage.

Within each WorkbookSession, you can perform advanced operations on worksheets:

reading/writing cells
adjusting formatting
creating tables
etc.

Examples and Usage

Initializing the Client

from shared.sharepoint.client_v2 import SharePointClientV2
import asyncio

async def main():
    # Create the client with a default site.
    async with SharePointClientV2(default_site="Stoneware") as sp:
        # Now sp can manage files, folders, and Excel content.
        pass

asyncio.run(main())

Listing Files and Folders

files = await sp.list_files(
    folder="/Data/Subfolder",
    recursive=True,
    site="Stoneware"  # optional if default is set
)

for f in files:
    print("File:", f.name, f.url, f.size)

Uploading and Downloading Files

# Upload a local file to a SharePoint folder
folder = await sp.get_folder("/Documents/Reports")
await folder.upload_file("report.pdf", "./local_report.pdf")

# Download the newly uploaded file
file = await sp.get_file("/Documents/Reports/report.pdf")
await file.download("./report_downloaded.pdf")

Working with Workbooks and Worksheets

# Suppose we have an Excel file named "PSA_Mastersheet.xlsx" in the "PSA Data" folder
# We want to open it and read some data.

file = await sp.get_file(["PSA Data", "PSA_Mastersheet.xlsx"])

async with file.create_workbook_session(persist_changes=True) as wb:
    sheets = await wb.list_worksheets()
    first_sheet = sheets[0]

    # Clear some cells
    await first_sheet.clear_cells("A1:B10", apply_to="All")

    # Write some data
    await first_sheet.write_cells(
        address="A1",
        values=[["Sample ID", "Run", "Date"], ["ABC123", "R1", "2025-02-11"]],
    )

    # Optionally create a table from a DataFrame
    # data_frame = pd.DataFrame(...)
    # await first_sheet.create_table_from_dataframe(
    #     df=data_frame,
    #     start="A1",
    #     header=True,
    #     style="TableStyleMedium2",
    # )

Incorporating in PSA code

Within psa.py, we can see how some of these structures might be used:

SharePointFile objects are passed around to parse file names, read content, etc.
Large data transformations are done in memory (pandas) and then optionally appended back to SharePoint.
Some tasks (like reading CSV data) rely on file.read() to get the raw content.
The psa_mastersheet asset in psa.py references the SharePointClientV2("PSA Data"), obtains an Excel file, and updates it with the new data.

The typical pattern is:

Create or retrieve a client
Grab references to specific folders or files
For Excel, open a WorkbookSession
Make the necessary range edits (clear cells, update values, create tables, etc.)
Close the session (or let the async with block do it), persisting changes.

That covers the main functionality and usage patterns for SharePointClientV2, SharePointFile, SharePointFolder, and WorkbookSession.

Additional Classes (not covered in detail but also used internally):

Worksheet / WorksheetRange for Excel cell range operations
Table, TableColumn, TableRow for advanced table-based operations
SharePointDB for caching/synchronizing a site structure in a Postgres database.

For further references, see the docstrings within each method and the usage examples in the psa.py module.

Last Updated: 2025-02-11.

Initialization​

Using the client​

Retrieving Files and Folders​

Checking Existence​

Listing Files and Folders​

Uploading and Downloading​

Reading into a DataFrame​

Folder and File Management​

Shapefile Support​

Example Usage​

Error Handling and Retries​

V2

Overview​

Key Classes​

SharePointClientV2​

SharePointFile​

SharePointFolder​

WorkbookSession​

Examples and Usage​

Initializing the Client​

Listing Files and Folders​

Uploading and Downloading Files​

Working with Workbooks and Worksheets​

Incorporating in PSA code​

Initialization

Using the client

Retrieving Files and Folders

Checking Existence

Listing Files and Folders

Uploading and Downloading

Reading into a DataFrame

Folder and File Management

Shapefile Support

Example Usage

Error Handling and Retries

Overview

Key Classes

SharePointClientV2

SharePointFile

SharePointFolder

WorkbookSession

Examples and Usage

Initializing the Client

Listing Files and Folders

Uploading and Downloading Files

Working with Workbooks and Worksheets

Incorporating in PSA code