SharePoint automation (v1)
The SharePointClient class provides a convenient Python interface for interacting with SharePoint sites:
- Seamlessly upload, download, and manage files and folders.
- Read Excel or CSV data into pandas DataFrames.
- Perform bulk operations like recursive folder traversal.
- Handle large file sizes with chunked uploads.
- Leverage built-in retry logic for network reliability.
Initialization
from shared.sharepoint.client import SharePointClient
# Create a client for the "Stoneware" SharePoint site.
client = SharePointClient("Stoneware")
Parameters:
- site (
str | None
): The name of the SharePoint site. IfNone
, defaults to the root site. - max_retries (
int
, optional): Maximum number of retry attempts for throttled or failed operations. Defaults to3
.
Notes:
- Creating a SharePointclient instance for a site requires an authentication request. You should only do it when you need the client and never in the global scope. Otherwise, every time your code is imported will trigger a network request.
- When you call
SharePointClient(site)
multiple times within the same thread, you get the same client instance. This makes it easier to deal with the requirement above, as repeated creation calls for the samesite
use a single session and do not repeat authentication calls.
Using the client
Most methods expect to receive a path. It can be provided as a string ("Folder/Subfolder/filename.ext"
) or a list of strings (["Folder", "Subfolder", "filename.ext"]), the client supports both.
Also, most methods are available on both the client and its associated File
and Folder
objects, which can be obtained by using get_file(path)
and get_folder(path)
. You can use one or the other, they're generally equivalent. For example client.read_file(path)
does client.get_file(path).read()
under the hood.
The File
and Folder
objects are helpful to access metadata or for more advanced use cases. They store a reference to their client internally, so you can mix files and folders from different sites without a problem.
Retrieving Files and Folders
-
get_file(path: str | list[str])
→File
Returns aFile
object representing a file at the given path. -
get_folder(path: str | list[str])
→Folder
Returns aFolder
object representing a folder at the given path.
Checking Existence
-
file_exists(path: str | list[str])
→bool
Checks if a file exists at the given path. -
folder_exists(path: str | list[str])
→bool
Checks if a folder exists at the given path.
Listing Files and Folders
-
list_files(path: str | list[str])
→list[str]
Returns a list of filenames in the specified folder. -
list_folders(path: str | list[str])
→list[str]
Returns a list of subfolder names in the specified folder.
These two methods return a list of strings for historical compatibility. For more advanced listing capabilities, use get_folder(path)
to get a Folder
object, which provides:
-
list_files(recursive: bool = False)
→list[File]
Lists all files in the folder, optionally including files in subfolders recursively. Recursive calls for large folder trees can take multiple minutes. -
list_folders(recursive: bool = False)
→list[Folder]
Lists all subfolders, optionally including nested folders recursively. Recursive calls for large folder trees can take multiple minutes.
Uploading and Downloading
-
upload_file(path: str | list[str], file: str | Path | bytes | io.BytesIO, chunksize: int = int(3.5e6))
Uploads a file (bytes,io.BytesIO
, or local file path) to the specified location on SharePoint. Automatically uses chunked uploads for large files (>3.5 MB). -
download_file(path: str | list[str], out: str | Path)
Downloads a file from SharePoint to your local filesystem.
Reading into a DataFrame
-
read_file(path: str | list[str])
→bytes
Reads the binary contents of a file from SharePoint. -
read_dataframe(path: str | list[str], sheet_name: int | str = 0, **kwargs)
→pandas.DataFrame
Reads an Excel file from SharePoint into a pandas DataFrame.
Folder and File Management
-
create_folder(path: str | list[str])
→Folder
Ensures the specified folder path exists on SharePoint (creating intermediate folders if necessary) and returns the resultingFolder
object. -
move_folder(current_path: str | list[str], new_path: str | list[str])
Moves a folder fromcurrent_path
tonew_path
. -
rename_file(path: str | list[str], new_name: str)
Renames a file at the given path. -
move_file(path: str | list[str], new_folder: str | list[str], overwrite: bool = False)
Moves a file to a new folder, optionally overwriting an existing file. -
delete_file(path: str | list[str])
Deletes a file at the given path.
Shapefile Support
load_shapefile(path: str | list[str])
→geopandas.GeoDataFrame
Locates and downloads all mandatory Shapefile components (e.g.,.shp
,.dbf
,.shx
) from a folder, then reads them into a GeoDataFrame.
Example Usage
from shared.sharepoint.client import SharePointClient
# Instantiate a SharePoint client
client = SharePointClient("AnalyticalData")
# 1. Creating a new folder
folder = client.create_folder("new-data-folder")
# 2. Uploading a local file
local_file_path = "./data/sample.xlsx"
client.upload_file(["new-data-folder", "sample.xlsx"], local_file_path)
# 3. Reading that file into a DataFrame
df = client.read_dataframe(["new-data-folder", "sample.xlsx"], sheet_name=0)
# 4. Downloading the file
client.download_file(["new-data-folder", "sample.xlsx"], "./downloaded_sample.xlsx")
# 5. Checking if the file exists
if client.file_exists(["new-data-folder", "sample.xlsx"]):
print("File successfully uploaded!")
# 6. Moving the file
client.move_file(
["new-data-folder", "sample.xlsx"],
["archived-data", "sample_archived.xlsx"],
overwrite=True
)
# 7. Deleting the old folder
client.get_folder("new-data-folder").delete()
Error Handling and Retries
All SharePoint operations invoke an internal _execute_query()
method that uses incremental retry logic for up to max_retries
. If throttling or network errors occur, the client automatically waits and retries with exponential backoff.
You can control the default number of retries using the max_retries
argument:
client = SharePointClient("AnalyticalData", max_retries=5)
V2
Overview
SharePointClientV2 is designed to simplify reading, writing, and managing files/folders within a SharePoint Document Library, as well as interacting with Excel workbooks via Microsoft Graph. The library supports:
- Authentication and token management for Microsoft Graph
- Throttling, retry, and monitoring of long-running operations
- Handling files and folders (
SharePointFile
,SharePointFolder
) - Running queries and uploads/downloads for large files
- Creating and managing workbook sessions to read and update Excel files in SharePoint
Most methods use asyncio, providing concurrency-friendly operations.
Key Classes
SharePointClientV2
This is the main entry point for SharePoint interactions.
Key Responsibilities:
- Managing OAuth2 tokens
- Providing methods to list and create sites, files, folders
- Handling uploads and downloads (chunked for large files)
- Managing workbook sessions
- Caching references to site/directory info
- Managing concurrency through built-in rate limits, retries, and backoff
Highlighted Methods:
-
db_client(pg_url=PGURL) -> SharePointDB
- Returns a reference to a database-backed helper (
SharePointDB
) that can synchronize items to a Postgres DB. - Helpful if you want to track updates from the Graph API.
- Returns a reference to a database-backed helper (
-
get_file(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFile | None
- Retrieves a file by path, returning a
SharePointFile
. - Accepts an optional
site
to choose which SharePoint Site to use, otherwise uses the default. - If
allow_not_found=True
, returnsNone
if the file is not found.
- Retrieves a file by path, returning a
-
get_folder(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFolder | None
- Retrieves a folder by path, returning a
SharePointFolder
. - If
allow_not_found=True
, returnsNone
if the folder is not found.
- Retrieves a folder by path, returning a
-
ensure_folder(path: str | list[str], site: str | None = None) -> SharePointFolder
- If the folder does not exist, creates it. Otherwise returns the existing folder.
-
item_exists(path: str | list[str], site: str | None = None) -> bool
- Checks if a file or folder exists at the given path.
-
list_sites()
- Lists all available SharePoint sites.
-
_create_link(...)
- Internal method to create a sharing link. Typically used through
SharePointFile.create_link(...)
orSharePointFolder.create_link(...)
.
- Internal method to create a sharing link. Typically used through
-
_read_file(file_id: str, site: str) -> bytes
- Internal method that fetches the file’s binary contents.
-
_upload_file(folder_id: str, name: str, file: Path|bytes, ...) -> None
- Internal method to handle small or chunked file uploads.
-
close()
- Closes the underlying async session.
Initialization:
from shared.sharepoint.client_v2 import SharePointClientV2
async with SharePointClientV2(default_site="MySiteName") as sp:
# sp is ready for usage
# e.g. file = await sp.get_file(["folder", "subfolder", "myfile.docx"])
pass
SharePointFile
Represents a file in SharePoint.
Key Attributes:
id
: Unique ID of the file in SharePointname
: Name of the file (filename)path
: Path of the file, e.g./Some/Folder/filename.docx
url
: SharePoint web URL for the filesize
: File size in bytesdeleted
: Boolean indicating if the file has been marked deleted in SharePoint
Key Methods:
read() -> bytes
- Reads the file content into memory.
download(out: str | Path) -> None
- Downloads the file locally to the specified path.
rename(name: str | None, description: str | None = None) -> SharePointFile
- Renames the file (and optionally sets a description).
move(path: str | list[str], new_name: str | None = None)
- Moves the file to a different folder path and optionally renames it.
copy(path: str | list[str], new_name: str | None = None)
- Copies the file to a different path, optionally changing its name.
delete()
- Permanently deletes the file.
create_workbook_session(persist_changes=True, auto_refresh=False) -> WorkbookSession
- Initiates a workbook session if the file is an Excel workbook.
persist_changes
means changes persist once the session closes.auto_refresh
automatically refreshes the session token.
SharePointFolder
Represents a folder in SharePoint.
Key Attributes:
id
: Unique folder IDname
: Folder namepath
: Full path for the folder, e.g./Documents/Reports
children
: Number of immediate children in the folderdeleted
: Boolean indicating if the folder is marked as deleted
Key Methods:
list_children() -> (List[SharePointFile], List[SharePointFolder])
- Lists all child items in the folder.
list_files() -> List[SharePointFile]
andlist_folders() -> List[SharePointFolder]
- Helper methods to list files only or folders only.
upload_file(name: str, file: str | Path | bytes, chunksize=50_000_000)
- Uploads a new file to the folder; large files are chunked by default.
delete_child(name: str) -> None
- Deletes a file or folder by name.
ensure_folder(path: str | list[str]) -> SharePointFolder
- Creates subfolders as needed.
rename(name: str | None, description: str | None = None)
- Renames the folder.
move(...)
,copy(...)
,delete(...)
- Similar to
SharePointFile
operations.
- Similar to
WorkbookSession
Represents an active session for editing an Excel workbook stored in SharePoint (via the Graph API). Created via SharePointFile.create_workbook_session(...)
.
Key Methods:
list_worksheets() -> List[Worksheet]
- Returns all worksheets within the workbook.
get_worksheet(name_or_id: str) -> Worksheet
- Retrieves a worksheet by its name or ID.
close()
- Closes the session. If
persist_changes=True
, changes are committed.
- Closes the session. If
batch_request(requests: List[dict]) -> List[dict]
- Sends multiple Excel Graph operations in a single batch call.
refresh_session()
- Extends the session lifetime to avoid expiration.
__aenter__
/__aexit__
- Allows
async with WorkbookSession(...)
usage.
- Allows
Within each WorkbookSession
, you can perform advanced operations on worksheets:
- reading/writing cells
- adjusting formatting
- creating tables
- etc.
Examples and Usage
Initializing the Client
from shared.sharepoint.client_v2 import SharePointClientV2
import asyncio
async def main():
# Create the client with a default site.
async with SharePointClientV2(default_site="Stoneware") as sp:
# Now sp can manage files, folders, and Excel content.
pass
asyncio.run(main())
Listing Files and Folders
files = await sp.list_files(
folder="/Data/Subfolder",
recursive=True,
site="Stoneware" # optional if default is set
)
for f in files:
print("File:", f.name, f.url, f.size)
Uploading and Downloading Files
# Upload a local file to a SharePoint folder
folder = await sp.get_folder("/Documents/Reports")
await folder.upload_file("report.pdf", "./local_report.pdf")
# Download the newly uploaded file
file = await sp.get_file("/Documents/Reports/report.pdf")
await file.download("./report_downloaded.pdf")
Working with Workbooks and Worksheets
# Suppose we have an Excel file named "PSA_Mastersheet.xlsx" in the "PSA Data" folder
# We want to open it and read some data.
file = await sp.get_file(["PSA Data", "PSA_Mastersheet.xlsx"])
async with file.create_workbook_session(persist_changes=True) as wb:
sheets = await wb.list_worksheets()
first_sheet = sheets[0]
# Clear some cells
await first_sheet.clear_cells("A1:B10", apply_to="All")
# Write some data
await first_sheet.write_cells(
address="A1",
values=[["Sample ID", "Run", "Date"], ["ABC123", "R1", "2025-02-11"]],
)
# Optionally create a table from a DataFrame
# data_frame = pd.DataFrame(...)
# await first_sheet.create_table_from_dataframe(
# df=data_frame,
# start="A1",
# header=True,
# style="TableStyleMedium2",
# )
Incorporating in PSA code
Within psa.py
, we can see how some of these structures might be used:
SharePointFile
objects are passed around to parse file names, read content, etc.- Large data transformations are done in memory (pandas) and then optionally appended back to SharePoint.
- Some tasks (like reading CSV data) rely on
file.read()
to get the raw content. - The
psa_mastersheet
asset inpsa.py
references theSharePointClientV2("PSA Data")
, obtains an Excel file, and updates it with the new data.
The typical pattern is:
- Create or retrieve a client
- Grab references to specific folders or files
- For Excel, open a
WorkbookSession
- Make the necessary range edits (clear cells, update values, create tables, etc.)
- Close the session (or let the
async with
block do it), persisting changes.
That covers the main functionality and usage patterns for SharePointClientV2
, SharePointFile
, SharePointFolder
, and WorkbookSession
.
Additional Classes (not covered in detail but also used internally):
Worksheet
/WorksheetRange
for Excel cell range operationsTable
,TableColumn
,TableRow
for advanced table-based operationsSharePointDB
for caching/synchronizing a site structure in a Postgres database.
For further references, see the docstrings within each method and the usage examples in the psa.py
module.
Last Updated: 2025-02-11.