SharePoint automation (v2)

Context

SharePointClientV2 is a redesign of our SharePoint automation from scratch.

It aims to solve the following problems in the V1:

Poor performance of the underlying Office 365 library
Lack of clarity around amount of network requests
Improve reliability by using the more modern Microsoft Graph API
Provide new features, such as updating spreadsheets collaboratively while other users have them open

It enables you to read, write, navigate, and manage files and folders, as well as interact with Excel spreadsheets. The differences with the V1 are:

Written from scratch, providing us with full control and using the modern Microsoft Graph API
Async interface, enabling parallelization of network requests. This adds a little complexity, but pushes our code towards best performance.
Read & update spreadsheets directly in SharePoint, even if other users have them open.
Single client for all SharePoint sites
Continuous synchronization of changes to database tables with the new SharePointDB
Fast recursive listing of files and folders with the new SharePointDB
More robust through better retry mechanisms, proper throttling to avoid hitting rate limits, and respect of "Retry-After" delays returned by the API.
Batch queries (advanced, mostly used internally)

General considerations

A single client can be used across all SharePoint sites. All methods on the SharePoint client have an optional site keyword argument. A default site can be passed when initializing the client, to avoid having to pass it explicitly every time.
Initializing multiple clients is also not a problem, they will share the same HTTP session and rate limit to maximize performance and avoid throttling.
If you are processing multiple files in parallel, it's worth defining a process_file async function and running it concurrently on all files with asyncio.gather
Try to work in safe locations when testing out your code during development. The recommended way to do that is to import from shared.env import IS_PROD, IS_DEV and rely on those flags to either output to the real folder or to a test folder. The recommended location for testing is the 99. Tests/Outputs folder in the Stoneware SharePoint site.

Working with files and folders

Full example

sp = SharePointClientV2("PSA Data")

# Navigate folders
folder = await sp.get_folder("/Data/Reports")
file = await sp.get_file("/Data/Reports/report.pdf")

# Get file/folder metadata
print(file.name, file.size, file.created_at, file.modified_at)
print(folder.name, folder.item_count, folder.created_at)


new_folder = await sp.create_folder("/Data/Reports/2024")
# `ensure_folder` creates the folder if it doesn't exist and doesn't error otherwise
new_folder = await sp.ensure_folder("/Data/Reports/2024")

file = await file.move("/Data/Reports/2024/report.pdf")

copy_file = await file.copy("/Data/Reports/2024/report_copy.pdf")
copy_file = await copy_file.rename("new_name.pdf")
await copy_file.delete()

folder = await folder.rename("Reports 2024")

# `recursive=True` is available, but not always fast. For large folder
# and file structures, using the `SharePointDB` is recommended (see below).
subfolders = await folder.list_folders()
files = await folder.list_files()

await folder.delete()

SharePointClientV2

This is the main entry point for SharePoint interactions.

Key Responsibilities:

Providing methods to retrieve, list, and create sites, files, folders
Handling uploads and downloads (chunked for large files)
Managing workbook sessions
Managing authentication tokens
Managing concurrency through built-in rate limits, retries, and backoff

Constructor:

def __init__(
    default_site: str | None = None,
    monitoring: Monitoring | None = None,
    logger: logging.Logger | PrintLogger | None = None,
) -> None

default_site: Name of the default SharePoint site to use. If provided, methods won't need an explicit site parameter.
monitoring: Optional configuration for long-running operations (upload/copy). Defaults to Monitoring().
logger: Optional logger for debug/warning messages. Defaults to PrintLogger(level=logging.INFO).

Methods:

get_file(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFile | None
- Retrieves a file by path, returning a SharePointFile.
- Accepts an optional site to choose which SharePoint Site to use, otherwise uses the default.
- If allow_not_found=True, returns None if the file is not found.
get_folder(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFolder | None
- Retrieves a folder by path, returning a SharePointFolder.
- If allow_not_found=True, returns None if the folder is not found.
ensure_folder(path: str | list[str], site: str | None = None) -> SharePointFolder
- If the folder does not exist, creates it. Otherwise returns the existing folder.
item_exists(path: str | list[str], site: str | None = None) -> bool
- Checks if a file or folder exists at the given path.
list_sites()
- Lists all available SharePoint sites.
_create_link(...)
- Internal method to create a sharing link. Typically used through SharePointFile.create_link(...) or SharePointFolder.create_link(...).
_read_file(file_id: str, site: str) -> bytes
- Internal method that fetches the file’s binary contents.
_upload_file(folder_id: str, name: str, file: Path|bytes, ...) -> None
- Internal method to handle small or chunked file uploads.
close()
- Closes the underlying async session (good practice for long-running scripts).

SharePointFile

Represents a file in SharePoint.

Attributes:

id: Unique ID of the file in SharePoint
name: Name of the file (filename)
path: Path of the file, e.g. /Some/Folder/filename.docx
url: SharePoint web URL for the file
size: File size in bytes
deleted: Boolean indicating if the file has been marked deleted in SharePoint

Methods:

read() -> bytes
- Reads the file content into memory.
download(out: str | Path) -> None
- Downloads the file locally to the specified path.
rename(name: str | None, description: str | None = None) -> SharePointFile
- Renames the file (and optionally sets a description).
move(path: str | list[str], new_name: str | None = None)
- Moves the file to a different parent folder and optionally renames it.
copy(path: str | list[str], new_name: str | None = None)
- Copies the file to a different parent folder and optionally renames it.
delete()
- Permanently deletes the file.
create_workbook_session(persist_changes=True, auto_refresh=False) -> WorkbookSession
- Initiates a workbook session if the file is an Excel workbook.
- persist_changes means changes persist once the session closes.
- auto_refresh automatically refreshes the session token.

SharePointFolder

Represents a folder in SharePoint.

Attributes:

id: Unique folder ID
name: Folder name
path: Full path for the folder, e.g. /Documents/Reports
children: Number of immediate children in the folder
deleted: Boolean indicating if the folder is marked as deleted

Methods:

list_children() -> (List[SharePointFile], List[SharePointFolder])
- Lists all child items in the folder.
list_files() -> List[SharePointFile] and list_folders() -> List[SharePointFolder]
- Helper methods to list files only or folders only.
upload_file(name: str, file: str | Path | bytes, chunksize=50_000_000)
- Uploads a new file to the folder; large files are chunked by default.
delete_child(name: str) -> None
- Deletes a file or folder by name.
ensure_folder(path: str | list[str]) -> SharePointFolder
- Creates subfolders as needed.
rename(name: str | None, description: str | None = None)
- Renames the folder.
move(path: str | list[str], new_name: str | None = None)
- Moves the folder to a different parent folder and optionally renames it.
copy(path: str | list[str], new_name: str | None = None)
- Copies the folder to a different parent folder and optionally renames it.
delete()
- Permanently deletes the folder.

Working with spreadsheets

This new version of the SharePoint client comes with the ability to read and modify spreadsheets without re-uploading the full file. This means, it is now possible to update a spreadsheet while other people are using it. In addition, we get access to new features, like locking.

Full example

from shared.sharepoint.client_v2 import SharePointClientV2
import pandas as pd
from datetime import datetime
import asyncio

   sp = SharePointClientV2(default_site="MyTeamSite")
# Get an Excel file
file = await sp.get_file("/Reports/Sales.xlsx")

# Open workbook session with auto-refresh enabled
async with file.open_workbook(auto_refresh=True) as wb:
   # Get worksheet by name
   sheet = await wb.get_worksheet("Sales Data")

   # Read existing data
   data = await sheet.read_cells("A1:D10")
   print(f"Found {data.row_count} rows of data")

   # Write some values
   values = [
         ["Region", "Sales", "Date", "Status"],
         ["North", 50000, datetime.now(), "Active"],
         ["South", 75000, datetime.now(), "Active"]
   ]
   await sheet.write_cells("A1:D3", values)

   # Apply formatting
   await sheet.set_font("A1:D1", bold=True, size=12)
   await sheet.fill_cells("A1:D1", color="#E6E6E6")
   await sheet.draw_borders(
         "A1:D3",
         color="#000000",
         style="Continuous",
         weight="Thin",
         sides=["EdgeBottom", "EdgeTop", "EdgeLeft", "EdgeRight", "InsideHorizontal"]
   )

   # Create table from pandas DataFrame
   df = pd.DataFrame({
         "Product": ["Widget A", "Widget B", "Widget C"],
         "Price": [19.99, 24.99, 15.99],
         "Stock": [100, 150, 75],
         "LastUpdated": [datetime.now()] * 3
   })

   await sheet.create_table_from_dataframe(
         df,
         start="F1",
         number_formats=["@", "$#,##0.00", "#,##0", "mm/dd/yyyy hh:mm"],
         column_widths=["auto", 100, 80, 120],
         alignments=["Left", "Right", "Right", "Center"],
         style="TableStyleMedium2"
   )

   # Auto-fit some columns
   await sheet.autofit_columns("A:D")

   # Protect the worksheet
   await sheet.protect(
         allow_format_cells=True,
         allow_format_columns=True,
         allow_sort=True,
         allow_filter=True
   )

WorkbookSession

The entrypoint for spreadsheet operations is a workbook session. The client connects to the workbook (you can see a "SharePoint App" icon appear among the users in the web interface), makes changes, and saves them at the end of the connection. This is similar to a database connection, where changes are often committed at the end.

It can be initialized from a file:

sp = SharePointClientV2("PSA Data")
file = await sp.get_file("PSA_Mastersheet.xlsx")
# Create a workbook session within the context. Changes are automatically saved at the end.
async with file.create_workbook_session() as wb:
    # Get worksheet by name
    sheet = await wb.get_worksheet("Sheet1")
    # Or get all worksheets
    sheets = await wb.list_worksheets()
    sheet = sheets[0]
    # do work...

Methods:

list_worksheets() -> List[Worksheet]
- Returns all worksheets within the workbook.
get_worksheet(name_or_id: str) -> Worksheet
- Retrieves a worksheet by its name or ID.
add_worksheet(name: str)
- Add a new worksheet with the specified name.
delete_worksheet(name_or_id: str)
- Delete a worksheet by name or ID.

Worksheet

This is where the real work happens. Once you created a workbook session, you can access the different sheets and use them to read or update the spreadsheet contents.

Methods:

get_bounds() -> WorksheetBounds
- Returns the used range bounds (column/row indices and counts).
update(name: str | None = None, position: int | None = None, visibility: "Visible" | "Hidden" | "VeryHidden" | None = None)
- Updates worksheet properties like name, position, and visibility.
read_cells(address: str) -> WorksheetRange
- Reads cell contents and properties from the specified range.
- Address uses A1 notation (e.g. "A1:B2").
write_cells(address: str, values: List[List[Union[int, str, bool, datetime, None]]] | None = None, number_format: List[List[str | None]] | None = None, datetime_format: str = "%Y-%m-%d %H:%M:%S")
- Writes values and optional number formats to cells.
- Handles formulas (strings starting with "=").
- Converts datetime objects using the specified format.
insert_cells(address: str, shift: "Down" | "Right")
- Inserts cells and shifts existing ones down or right.
delete_cells(address: str, shift: "Up" | "Left")
- Deletes cells and shifts remaining ones up or left.
format_cells(address: str, column_width: float | None = None, horizontal_alignment: str | None = None, row_height: float | None = None, vertical_alignment: str | None = None, wrap_text: bool | None = None)
- Applies formatting like alignment, width/height, and text wrapping.
autofit_columns(address: str) / autofit_rows(address: str)
- Automatically adjusts column widths or row heights for best fit.
draw_borders(address: str, color: str, style: str, weight: str, sides: list[str])
- Draws borders around or inside a range with specified style.
- Sides can be EdgeTop, EdgeBottom, EdgeLeft, EdgeRight, etc.
fill_cells(address: str, color: str)
- Fills cells with a background color (HTML code or name).
set_font(address: str, bold: bool | None = None, color: str | None = None, italic: bool | None = None, name: str | None = None, size: float | None = None, underline: str | None = None)
- Sets font properties for a range of cells.
clear_cells(address: str, apply_to: "All" | "Formats" | "Contents" = "All")
- Clears cell contents and/or formatting.
merge_cells(address: str, across: bool = False) / unmerge_cells(address: str)
- Merges or unmerges cells in the specified range.
- across=True merges each row separately.
protect(allow_format_cells: bool = False, ...) / unprotect()
- Adds or removes worksheet protection with granular permissions.
create_table(range_address: str, has_headers: bool = True)
- Creates a table from a range of cells.
list_tables() / get_table(table_id_or_name: str)
- Lists all tables or gets a specific table.
create_table_from_dataframe(df: pd.DataFrame, start: str, header: bool | list[str] = True, number_formats: list[str | None] | None = None, columns: list[str] | None = None, column_widths: list[float | "auto"] | None = None, alignments: list[str] | None = None, table_name: str | None = None, style: str | None = None, datetime_format: str = "%Y-%m-%d %H:%M:%S")
- Creates a formatted Excel table from a pandas DataFrame.
- Parameters:
  - df: The pandas DataFrame to convert
  - start: Top-left cell address in A1 notation (e.g. "A1")
  - header: If True, use DataFrame column names. If list, use as custom headers. If False, no header.
  - number_formats: List of Excel number formats to apply per column
  - columns: Optional list of DataFrame columns to include (and their order)
  - column_widths: List of column widths in points, or "auto" for autofit
  - alignments: List of horizontal alignments per column ("Left", "Center", "Right")
  - table_name: Optional name for the Excel table
  - style: Optional Excel table style (e.g. "TableStyleLight1")
  - datetime_format: strftime format for datetime values
- Returns the created Excel Table object
- Handles automatic conversion of pandas data types including datetimes
- Applies all formatting in an optimized batch operation

Usage in Streamlit

Using the SharePointClientV2 in Streamlit requires one specific trick:

from data_tools.async_streamlit import run_async_main

async def main():
   # actual logic
   ...

if __name__ == "__main__":
   # you can pass eventual arguments and keyword arguments for `main`
   # to `run_async_main` itself.
   run_async_main(main)

The reason for it is a bit technical, but this essentially prevents having multiple versions of the page executing code at the same time if Streamlit reruns the page while an async task is going on (like a file upload). I invite you to read the comment for run_async_entrypoint for more detail.

In addition to this, two utilities are available to replace st.cache_resource and st.cache_data, for async functions:

from data_tools.async_streamlit import st_cache_resource_async, st_cache_resource_data

Gotchas

The SharePointDB connects to the production database by default, where the automation server syncs changes every 30 seconds. This means you may not see changes immediately with its methods. If the automation server has an issue, the files and folders table may not sync at all (should be rare, but worth knowing).

Advanced: SharePointDB

[WIP]

SharePointDB essentially exists to enable incremental SharePoint assets. It had to solve the following challenges:

Continuously synchronize the latest SharePoint changes to some data structure
Very quickly perform .list_files(recursive=True) and .list_folders(recursive=True) operations
Enable fast filtering by last update date and other criteria
Keep track of deleted files

Context​

General considerations​

Working with files and folders​

Full example​

SharePointClientV2​

SharePointFile​

SharePointFolder​

Working with spreadsheets​

Full example​

WorkbookSession​

Worksheet​

Usage in Streamlit​

Gotchas​

Advanced: SharePointDB​

Context

General considerations

Working with files and folders

Full example

SharePointClientV2

SharePointFile

SharePointFolder

Working with spreadsheets

Full example

WorkbookSession

Worksheet

Usage in Streamlit

Gotchas

Advanced: SharePointDB