SharePoint automation (v1)
The SharePointClient class provides a convenient Python interface for interacting with SharePoint sites:
- Seamlessly upload, download, and manage files and folders.
- Read Excel or CSV data into pandas DataFrames.
- Perform bulk operations like recursive folder traversal.
- Handle large file sizes with chunked uploads.
- Leverage built-in retry logic for network reliability.
Initialization
from shared.sharepoint.client import SharePointClient
# Create a client for the "Stoneware" SharePoint site.
client = SharePointClient("Stoneware")
Parameters:
- site (
str | None): The name of the SharePoint site. IfNone, defaults to the root site. - max_retries (
int, optional): Maximum number of retry attempts for throttled or failed operations. Defaults to3.
Notes:
- Creating a SharePointclient instance for a site requires an authentication request. You should only do it when you need the client and never in the global scope. Otherwise, every time your code is imported will trigger a network request.
- When you call
SharePointClient(site)multiple times within the same thread, you get the same client instance. This makes it easier to deal with the requirement above, as repeated creation calls for the samesiteuse a single session and do not repeat authentication calls.
Using the client
Most methods expect to receive a path. It can be provided as a string ("Folder/Subfolder/filename.ext") or a list of strings (["Folder", "Subfolder", "filename.ext"]), the client supports both.
Also, most methods are available on both the client and its associated File and Folder objects, which can be obtained by using get_file(path) and get_folder(path). You can use one or the other, they're generally equivalent. For example client.read_file(path) does client.get_file(path).read() under the hood.
The File and Folder objects are helpful to access metadata or for more advanced use cases. They store a reference to their client internally, so you can mix files and folders from different sites without a problem.
Retrieving Files and Folders
-
get_file(path: str | list[str])→File
Returns aFileobject representing a file at the given path. -
get_folder(path: str | list[str])→Folder
Returns aFolderobject representing a folder at the given path.
Checking Existence
-
file_exists(path: str | list[str])→bool
Checks if a file exists at the given path. -
folder_exists(path: str | list[str])→bool
Checks if a folder exists at the given path.
Listing Files and Folders
-
list_files(path: str | list[str])→list[str]
Returns a list of filenames in the specified folder. -
list_folders(path: str | list[str])→list[str]
Returns a list of subfolder names in the specified folder.
These two methods return a list of strings for historical compatibility. For more advanced listing capabilities, use get_folder(path) to get a Folder object, which provides:
-
list_files(recursive: bool = False)→list[File]
Lists all files in the folder, optionally including files in subfolders recursively. Recursive calls for large folder trees can take multiple minutes. -
list_folders(recursive: bool = False)→list[Folder]
Lists all subfolders, optionally including nested folders recursively. Recursive calls for large folder trees can take multiple minutes.
Uploading and Downloading
-
upload_file(path: str | list[str], file: str | Path | bytes | io.BytesIO, chunksize: int = int(3.5e6))
Uploads a file (bytes,io.BytesIO, or local file path) to the specified location on SharePoint. Automatically uses chunked uploads for large files (>3.5 MB). -
download_file(path: str | list[str], out: str | Path)
Downloads a file from SharePoint to your local filesystem.
Reading into a DataFrame
-
read_file(path: str | list[str])→bytes
Reads the binary contents of a file from SharePoint. -
read_dataframe(path: str | list[str], sheet_name: int | str = 0, **kwargs)→pandas.DataFrame
Reads an Excel file from SharePoint into a pandas DataFrame.
Folder and File Management
-
create_folder(path: str | list[str])→Folder
Ensures the specified folder path exists on SharePoint (creating intermediate folders if necessary) and returns the resultingFolderobject. -
move_folder(current_path: str | list[str], new_path: str | list[str])
Moves a folder fromcurrent_pathtonew_path. -
rename_file(path: str | list[str], new_name: str)
Renames a file at the given path. -
move_file(path: str | list[str], new_folder: str | list[str], overwrite: bool = False)
Moves a file to a new folder, optionally overwriting an existing file. -
delete_file(path: str | list[str])
Deletes a file at the given path.
Shapefile Support
load_shapefile(path: str | list[str])→geopandas.GeoDataFrame
Locates and downloads all mandatory Shapefile components (e.g.,.shp,.dbf,.shx) from a folder, then reads them into a GeoDataFrame.
Example Usage
from shared.sharepoint.client import SharePointClient
# Instantiate a SharePoint client
client = SharePointClient("AnalyticalData")
# 1. Creating a new folder
folder = client.create_folder("new-data-folder")
# 2. Uploading a local file
local_file_path = "./data/sample.xlsx"
client.upload_file(["new-data-folder", "sample.xlsx"], local_file_path)
# 3. Reading that file into a DataFrame
df = client.read_dataframe(["new-data-folder", "sample.xlsx"], sheet_name=0)
# 4. Downloading the file
client.download_file(["new-data-folder", "sample.xlsx"], "./downloaded_sample.xlsx")
# 5. Checking if the file exists
if client.file_exists(["new-data-folder", "sample.xlsx"]):
print("File successfully uploaded!")
# 6. Moving the file
client.move_file(
["new-data-folder", "sample.xlsx"],
["archived-data", "sample_archived.xlsx"],
overwrite=True
)
# 7. Deleting the old folder
client.get_folder("new-data-folder").delete()
Error Handling and Retries
All SharePoint operations invoke an internal _execute_query() method that uses incremental retry logic for up to max_retries. If throttling or network errors occur, the client automatically waits and retries with exponential backoff.
You can control the default number of retries using the max_retries argument:
client = SharePointClient("AnalyticalData", max_retries=5)
V2
Overview
SharePointClientV2 is designed to simplify reading, writing, and managing files/folders within a SharePoint Document Library, as well as interacting with Excel workbooks via Microsoft Graph. The library supports:
- Authentication and token management for Microsoft Graph
- Throttling, retry, and monitoring of long-running operations
- Handling files and folders (
SharePointFile,SharePointFolder) - Running queries and uploads/downloads for large files
- Creating and managing workbook sessions to read and update Excel files in SharePoint
Most methods use asyncio, providing concurrency-friendly operations.
Key Classes
SharePointClientV2
This is the main entry point for SharePoint interactions.
Key Responsibilities:
- Managing OAuth2 tokens
- Providing methods to list and create sites, files, folders
- Handling uploads and downloads (chunked for large files)
- Managing workbook sessions
- Caching references to site/directory info
- Managing concurrency through built-in rate limits, retries, and backoff
Highlighted Methods:
-
db_client(pg_url=PGURL) -> SharePointDB- Returns a reference to a database-backed helper (
SharePointDB) that can synchronize items to a Postgres DB. - Helpful if you want to track updates from the Graph API.
- Returns a reference to a database-backed helper (
-
get_file(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFile | None- Retrieves a file by path, returning a
SharePointFile. - Accepts an optional
siteto choose which SharePoint Site to use, otherwise uses the default. - If
allow_not_found=True, returnsNoneif the file is not found.
- Retrieves a file by path, returning a
-
get_folder(path: str | list[str], site: str | None = None, allow_not_found=False) -> SharePointFolder | None- Retrieves a folder by path, returning a
SharePointFolder. - If
allow_not_found=True, returnsNoneif the folder is not found.
- Retrieves a folder by path, returning a
-
ensure_folder(path: str | list[str], site: str | None = None) -> SharePointFolder- If the folder does not exist, creates it. Otherwise returns the existing folder.
-
item_exists(path: str | list[str], site: str | None = None) -> bool- Checks if a file or folder exists at the given path.
-
list_sites()- Lists all available SharePoint sites.
-
_create_link(...)- Internal method to create a sharing link. Typically used through
SharePointFile.create_link(...)orSharePointFolder.create_link(...).
- Internal method to create a sharing link. Typically used through
-
_read_file(file_id: str, site: str) -> bytes- Internal method that fetches the file’s binary contents.
-
_upload_file(folder_id: str, name: str, file: Path|bytes, ...) -> None- Internal method to handle small or chunked file uploads.
-
close()
- Closes the underlying async session.
Initialization:
from shared.sharepoint.client_v2 import SharePointClientV2
async with SharePointClientV2(default_site="MySiteName") as sp:
# sp is ready for usage
# e.g. file = await sp.get_file(["folder", "subfolder", "myfile.docx"])
pass
SharePointFile
Represents a file in SharePoint.
Key Attributes:
id: Unique ID of the file in SharePointname: Name of the file (filename)path: Path of the file, e.g./Some/Folder/filename.docxurl: SharePoint web URL for the filesize: File size in bytesdeleted: Boolean indicating if the file has been marked deleted in SharePoint
Key Methods:
read() -> bytes- Reads the file content into memory.
download(out: str | Path) -> None- Downloads the file locally to the specified path.
rename(name: str | None, description: str | None = None) -> SharePointFile- Renames the file (and optionally sets a description).
move(path: str | list[str], new_name: str | None = None)- Moves the file to a different folder path and optionally renames it.
copy(path: str | list[str], new_name: str | None = None)- Copies the file to a different path, optionally changing its name.
delete()- Permanently deletes the file.
create_workbook_session(persist_changes=True, auto_refresh=False) -> WorkbookSession- Initiates a workbook session if the file is an Excel workbook.
persist_changesmeans changes persist once the session closes.auto_refreshautomatically refreshes the session token.
SharePointFolder
Represents a folder in SharePoint.
Key Attributes:
id: Unique folder IDname: Folder namepath: Full path for the folder, e.g./Documents/Reportschildren: Number of immediate children in the folderdeleted: Boolean indicating if the folder is marked as deleted
Key Methods:
list_children() -> (List[SharePointFile], List[SharePointFolder])- Lists all child items in the folder.
list_files() -> List[SharePointFile]andlist_folders() -> List[SharePointFolder]- Helper methods to list files only or folders only.
upload_file(name: str, file: str | Path | bytes, chunksize=50_000_000)- Uploads a new file to the folder; large files are chunked by default.
delete_child(name: str) -> None- Deletes a file or folder by name.
ensure_folder(path: str | list[str]) -> SharePointFolder- Creates subfolders as needed.
rename(name: str | None, description: str | None = None)- Renames the folder.
move(...),copy(...),delete(...)- Similar to
SharePointFileoperations.
- Similar to