SharePoint automation (v2)
Context
SharePointClientV2
is a redesign of our SharePoint automation from scratch.
It aims to solve the following problems in the V1:
- Poor performance of the underlying Office 365 library
- Lack of clarity around amount of network requests
- Improve reliability by using the more modern Microsoft Graph API
- Provide new features, such as updating spreadsheets collaboratively while other users have them open
It enables you to read, write, navigate, and manage files and folders, as well as interact with Excel spreadsheets. The differences with the V1 are:
- Written from scratch, providing us with full control and using the modern Microsoft Graph API
- Async interface, enabling parallelization of network requests. This adds a little complexity, but pushes our code towards best performance.
- Read & update spreadsheets directly in SharePoint, even if other users have them open.
- Single client for all SharePoint sites
- Continuous synchronization of changes to database tables with the new
SharePointDB
- Fast recursive listing of files and folders with the new
SharePointDB
- More robust through better retry mechanisms, proper throttling to avoid hitting rate limits, and respect of "Retry-After" delays returned by the API.
- Batch queries (advanced, mostly used internally)
General considerations
- A single client can be used across all SharePoint sites. All methods on the SharePoint client have an optional
site
keyword argument. A default site can be passed when initializing the client, to avoid having to pass it explicitly every time. - Initializing multiple clients is also not a problem, they will share the same HTTP session and rate limit to maximize performance and avoid throttling.
- If you are processing multiple files in parallel, it's worth defining a
process_file
async function and running it concurrently on all files withasyncio.gather
- Try to work in safe locations when testing out your code during development. The recommended way to do that is to import
from shared.env import IS_PROD, IS_DEV
and rely on those flags to either output to the real folder or to a test folder. The recommended location for testing is the99. Tests/Outputs
folder in the Stoneware SharePoint site.
Working with files and folders
Full example
sp = SharePointClientV2("PSA Data")
# Navigate folders
folder = await sp.get_folder("/Data/Reports")
file = await sp.get_file("/Data/Reports/report.pdf")
# Get file/folder metadata
print(file.name, file.size, file.created_at, file.modified_at)
print(folder.name, folder.item_count, folder.created_at)
new_folder = await sp.create_folder("/Data/Reports/2024")
# `ensure_folder` creates the folder if it doesn't exist and doesn't error otherwise
new_folder = await sp.ensure_folder("/Data/Reports/2024")
file = await file.move("/Data/Reports/2024/report.pdf")
copy_file = await file.copy("/Data/Reports/2024/report_copy.pdf")
copy_file = await copy_file.rename("new_name.pdf")
await copy_file.delete()
folder = await folder.rename("Reports 2024")
# `recursive=True` is available, but not always fast. For large folder
# and file structures, using the `SharePointDB` is recommended (see below).
subfolders = await folder.list_folders()
files = await folder.list_files()
await folder.delete()
SharePointClientV2
This is the main entry point for SharePoint interactions.
Key Responsibilities:
- Providing methods to retrieve, list, and create sites, files, folders
- Handling uploads and downloads (chunked for large files)
- Managing workbook sessions
- Managing authentication tokens
- Managing concurrency through built-in rate limits, retries, and backoff
Constructor:
def __init__(
default_site: str | None = None,
monitoring: Monitoring | None = None,
logger: logging.Logger | PrintLogger | None = None,
) -> None
default_site
: Name of the default SharePoint site to use. If provided, methods won't need an explicit site parameter.monitoring
: Optional configuration for long-running operations (upload/copy). Defaults toMonitoring()
.logger
: Optional logger for debug/warning messages. Defaults toPrintLogger(level=logging.INFO)
.
Methods:
-
get_file(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFile | None
- Retrieves a file by path, returning a
SharePointFile
. - Accepts an optional
site
to choose which SharePoint Site to use, otherwise uses the default. - If
allow_not_found=True
, returnsNone
if the file is not found.
- Retrieves a file by path, returning a
-
get_folder(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFolder | None
- Retrieves a folder by path, returning a
SharePointFolder
. - If
allow_not_found=True
, returnsNone
if the folder is not found.
- Retrieves a folder by path, returning a
-
ensure_folder(path: str | list[str], site: str | None = None) -> SharePointFolder
- If the folder does not exist, creates it. Otherwise returns the existing folder.
-
item_exists(path: str | list[str], site: str | None = None) -> bool
- Checks if a file or folder exists at the given path.
-
list_sites()
- Lists all available SharePoint sites.
-
_create_link(...)
- Internal method to create a sharing link. Typically used through
SharePointFile.create_link(...)
orSharePointFolder.create_link(...)
.
- Internal method to create a sharing link. Typically used through
-
_read_file(file_id: str, site: str) -> bytes
- Internal method that fetches the file’s binary contents.
-
_upload_file(folder_id: str, name: str, file: Path|bytes, ...) -> None
- Internal method to handle small or chunked file uploads.
-
close()
- Closes the underlying async session (good practice for long-running scripts).
SharePointFile
Represents a file in SharePoint.
Attributes:
id
: Unique ID of the file in SharePointname
: Name of the file (filename)path
: Path of the file, e.g./Some/Folder/filename.docx
url
: SharePoint web URL for the filesize
: File size in bytesdeleted
: Boolean indicating if the file has been marked deleted in SharePoint
Methods:
read() -> bytes
- Reads the file content into memory.
download(out: str | Path) -> None
- Downloads the file locally to the specified path.
rename(name: str | None, description: str | None = None) -> SharePointFile
- Renames the file (and optionally sets a description).
move(path: str | list[str], new_name: str | None = None)
- Moves the file to a different parent folder and optionally renames it.
copy(path: str | list[str], new_name: str | None = None)
- Copies the file to a different parent folder and optionally renames it.
delete()
- Permanently deletes the file.
create_workbook_session(persist_changes=True, auto_refresh=False) -> WorkbookSession
- Initiates a workbook session if the file is an Excel workbook.
persist_changes
means changes persist once the session closes.auto_refresh
automatically refreshes the session token.
SharePointFolder
Represents a folder in SharePoint.
Attributes:
id
: Unique folder IDname
: Folder namepath
: Full path for the folder, e.g./Documents/Reports
children
: Number of immediate children in the folderdeleted
: Boolean indicating if the folder is marked as deleted
Methods:
list_children() -> (List[SharePointFile], List[SharePointFolder])
- Lists all child items in the folder.
list_files() -> List[SharePointFile]
andlist_folders() -> List[SharePointFolder]
- Helper methods to list files only or folders only.
upload_file(name: str, file: str | Path | bytes, chunksize=50_000_000)
- Uploads a new file to the folder; large files are chunked by default.
delete_child(name: str) -> None
- Deletes a file or folder by name.
ensure_folder(path: str | list[str]) -> SharePointFolder
- Creates subfolders as needed.
rename(name: str | None, description: str | None = None)
- Renames the folder.
move(path: str | list[str], new_name: str | None = None)
- Moves the folder to a different parent folder and optionally renames it.
copy(path: str | list[str], new_name: str | None = None)
- Copies the folder to a different parent folder and optionally renames it.
delete()
- Permanently deletes the folder.
Working with spreadsheets
This new version of the SharePoint client comes with the ability to read and modify spreadsheets without re-uploading the full file. This means, it is now possible to update a spreadsheet while other people are using it. In addition, we get access to new features, like locking.
Full example
from shared.sharepoint.client_v2 import SharePointClientV2
import pandas as pd
from datetime import datetime
import asyncio
sp = SharePointClientV2(default_site="MyTeamSite")
# Get an Excel file
file = await sp.get_file("/Reports/Sales.xlsx")
# Open workbook session with auto-refresh enabled
async with file.open_workbook(auto_refresh=True) as wb:
# Get worksheet by name
sheet = await wb.get_worksheet("Sales Data")
# Read existing data
data = await sheet.read_cells("A1:D10")
print(f"Found {data.row_count} rows of data")
# Write some values
values = [
["Region", "Sales", "Date", "Status"],
["North", 50000, datetime.now(), "Active"],
["South", 75000, datetime.now(), "Active"]
]
await sheet.write_cells("A1:D3", values)
# Apply formatting
await sheet.set_font("A1:D1", bold=True, size=12)
await sheet.fill_cells("A1:D1", color="#E6E6E6")
await sheet.draw_borders(
"A1:D3",
color="#000000",
style="Continuous",
weight="Thin",
sides=["EdgeBottom", "EdgeTop", "EdgeLeft", "EdgeRight", "InsideHorizontal"]
)
# Create table from pandas DataFrame
df = pd.DataFrame({
"Product": ["Widget A", "Widget B", "Widget C"],
"Price": [19.99, 24.99, 15.99],
"Stock": [100, 150, 75],
"LastUpdated": [datetime.now()] * 3
})
await sheet.create_table_from_dataframe(
df,
start="F1",
number_formats=["@", "$#,##0.00", "#,##0", "mm/dd/yyyy hh:mm"],
column_widths=["auto", 100, 80, 120],
alignments=["Left", "Right", "Right", "Center"],
style="TableStyleMedium2"
)
# Auto-fit some columns
await sheet.autofit_columns("A:D")
# Protect the worksheet
await sheet.protect(
allow_format_cells=True,
allow_format_columns=True,
allow_sort=True,
allow_filter=True
)
WorkbookSession
The entrypoint for spreadsheet operations is a workbook session. The client connects to the workbook (you can see a "SharePoint App" icon appear among the users in the web interface), makes changes, and saves them at the end of the connection. This is similar to a database connection, where changes are often committed at the end.
It can be initialized from a file:
sp = SharePointClientV2("PSA Data")
file = await sp.get_file("PSA_Mastersheet.xlsx")
# Create a workbook session within the context. Changes are automatically saved at the end.
async with file.create_workbook_session() as wb:
# Get worksheet by name
sheet = await wb.get_worksheet("Sheet1")
# Or get all worksheets
sheets = await wb.list_worksheets()
sheet = sheets[0]
# do work...
Methods:
list_worksheets() -> List[Worksheet]
- Returns all worksheets within the workbook.
get_worksheet(name_or_id: str) -> Worksheet
- Retrieves a worksheet by its name or ID.
add_worksheet(name: str)
- Add a new worksheet with the specified name.
delete_worksheet(name_or_id: str)
- Delete a worksheet by name or ID.
Worksheet
This is where the real work happens. Once you created a workbook session, you can access the different sheets and use them to read or update the spreadsheet contents.
Methods:
-
get_bounds() -> WorksheetBounds
- Returns the used range bounds (column/row indices and counts).
-
update(name: str | None = None, position: int | None = None, visibility: "Visible" | "Hidden" | "VeryHidden" | None = None)
- Updates worksheet properties like name, position, and visibility.
-
read_cells(address: str) -> WorksheetRange
- Reads cell contents and properties from the specified range.
- Address uses A1 notation (e.g. "A1:B2").
-
write_cells(address: str, values: List[List[Union[int, str, bool, datetime, None]]] | None = None, number_format: List[List[str | None]] | None = None, datetime_format: str = "%Y-%m-%d %H:%M:%S")
- Writes values and optional number formats to cells.
- Handles formulas (strings starting with "=").
- Converts datetime objects using the specified format.
-
insert_cells(address: str, shift: "Down" | "Right")
- Inserts cells and shifts existing ones down or right.
-
delete_cells(address: str, shift: "Up" | "Left")
- Deletes cells and shifts remaining ones up or left.
-
format_cells(address: str, column_width: float | None = None, horizontal_alignment: str | None = None, row_height: float | None = None, vertical_alignment: str | None = None, wrap_text: bool | None = None)
- Applies formatting like alignment, width/height, and text wrapping.
-
autofit_columns(address: str)
/autofit_rows(address: str)
- Automatically adjusts column widths or row heights for best fit.
-
draw_borders(address: str, color: str, style: str, weight: str, sides: list[str])
- Draws borders around or inside a range with specified style.
- Sides can be EdgeTop, EdgeBottom, EdgeLeft, EdgeRight, etc.
-
fill_cells(address: str, color: str)
- Fills cells with a background color (HTML code or name).
-
set_font(address: str, bold: bool | None = None, color: str | None = None, italic: bool | None = None, name: str | None = None, size: float | None = None, underline: str | None = None)
- Sets font properties for a range of cells.
-
clear_cells(address: str, apply_to: "All" | "Formats" | "Contents" = "All")
- Clears cell contents and/or formatting.
-
merge_cells(address: str, across: bool = False)
/unmerge_cells(address: str)
- Merges or unmerges cells in the specified range.
across=True
merges each row separately.
-
protect(allow_format_cells: bool = False, ...)
/unprotect()
- Adds or removes worksheet protection with granular permissions.
-
create_table(range_address: str, has_headers: bool = True)
- Creates a table from a range of cells.
-
list_tables()
/get_table(table_id_or_name: str)
- Lists all tables or gets a specific table.
-
create_table_from_dataframe(df: pd.DataFrame, start: str, header: bool | list[str] = True, number_formats: list[str | None] | None = None, columns: list[str] | None = None, column_widths: list[float | "auto"] | None = None, alignments: list[str] | None = None, table_name: str | None = None, style: str | None = None, datetime_format: str = "%Y-%m-%d %H:%M:%S")
- Creates a formatted Excel table from a pandas DataFrame.
- Parameters:
df
: The pandas DataFrame to convertstart
: Top-left cell address in A1 notation (e.g. "A1")header
: If True, use DataFrame column names. If list, use as custom headers. If False, no header.number_formats
: List of Excel number formats to apply per columncolumns
: Optional list of DataFrame columns to include (and their order)column_widths
: List of column widths in points, or "auto" for autofitalignments
: List of horizontal alignments per column ("Left", "Center", "Right")table_name
: Optional name for the Excel tablestyle
: Optional Excel table style (e.g. "TableStyleLight1")datetime_format
: strftime format for datetime values
- Returns the created Excel Table object
- Handles automatic conversion of pandas data types including datetimes
- Applies all formatting in an optimized batch operation
Usage in Streamlit
Using the SharePointClientV2 in Streamlit requires one specific trick:
from data_tools.async_streamlit import run_async_main
async def main():
# actual logic
...
if __name__ == "__main__":
# you can pass eventual arguments and keyword arguments for `main`
# to `run_async_main` itself.
run_async_main(main)
The reason for it is a bit technical, but this essentially prevents having multiple versions of the page executing code at the same time if Streamlit reruns the page while an async task is going on (like a file upload). I invite you to read the comment for run_async_entrypoint
for more detail.
In addition to this, two utilities are available to replace st.cache_resource
and st.cache_data
, for async functions:
from data_tools.async_streamlit import st_cache_resource_async, st_cache_resource_data
Gotchas
- The
SharePointDB
connects to the production database by default, where the automation server syncs changes every 30 seconds. This means you may not see changes immediately with its methods. If the automation server has an issue, the files and folders table may not sync at all (should be rare, but worth knowing).
Advanced: SharePointDB
[WIP]
SharePointDB
essentially exists to enable incremental SharePoint assets. It had to solve the following challenges:
- Continuously synchronize the latest SharePoint changes to some data structure
- Very quickly perform
.list_files(recursive=True)
and.list_folders(recursive=True)
operations - Enable fast filtering by last update date and other criteria
- Keep track of deleted files