PSA - Particle Size Analysis
Particle Size Analysis (PSA) uses laser diffraction to measure the grain size distribution of powdered materials. It provides cumulative distribution curves and quantile statistics (D-values) that characterize the particle size distribution.
Method Overview
| Property | Value |
|---|---|
| Full Name | Particle Size Analysis (Laser Diffraction) |
| Purpose | Measures grain size distribution of powdered materials |
| Output | Cumulative distribution curve and quantile statistics |
| Output Units | Particle diameter in microns (um) |
| SharePoint Location | PSA Data |
| File Format | CSV |
Key Metrics
The D-values represent particle diameters at which X% of particles are smaller:
| Metric | Description |
|---|---|
| D5 | 5% of particles are smaller |
| D10 | 10% of particles are smaller |
| D20 | 20% of particles are smaller |
| D30 | 30% of particles are smaller |
| D40 | 40% of particles are smaller |
| D50 | Median particle size (50% smaller) |
| D60 | 60% of particles are smaller |
| D70 | 70% of particles are smaller |
| D80 | 80% of particles are smaller |
| D90 | 90% of particles are smaller |
| D95 | 95% of particles are smaller |
Distribution Data
| Column | Description |
|---|---|
| diameter | Particle diameter in microns (um) |
| q | Probability density at diameter |
| undersize | Cumulative percentage passing |
Data Pipeline
SharePoint Source
- Site: PSA Data
- Folder: Root (
/) - File Type: CSV
File Naming Convention
Files follow the pattern: [timestamp]-[sample_id].csv or [sample_id]-[run_number].csv
Timestamp format: YYYYMMDDHHmm (e.g., 202312151430)
Dagster Assets
The PSA data pipeline consists of the following assets in apps/datasmart/src/datasmart/assets/analytical/psa.py:
psa_raw (sharepoint_multi_asset)
├── psa_samples_raw (backend.psa_samples_raw)
└── psa_data_raw (backend.psa_data_raw)
↓
psa (multi_asset)
├── psa_samples (analytical.psa_samples)
└── psa_data (analytical.psa_data)
↓
psa_mean_data (analytical.psa_mean_data)
↓
psa_statistics (analytical.psa_statistics)
↓
psa_mastersheet (SharePoint Excel export)
Asset Descriptions
| Asset | Schema | Description |
|---|---|---|
psa_samples_raw | backend | Sample metadata from raw files |
psa_data_raw | backend | Raw distribution data (diameter, q, undersize) |
psa_samples | analytical | Cleaned sample metadata with normalized IDs |
psa_data | analytical | Distribution data with cleaned columns |
psa_mean_data | analytical | Averaged distributions for replicates |
psa_statistics | analytical | D-values for each sample and mean |
psa_mastersheet | SharePoint | Excel export of statistics |
Database Tables
analytical.psa_samples
Sample metadata table.
| Column | Type | Description |
|---|---|---|
file_id | string | SharePoint file identifier |
original_sample_id | string | Raw sample ID from filename |
sample_id | string | Normalized sample ID |
base_sample_id | string | Sample ID without replicate suffix |
sample_date | datetime | Timestamp from filename |
run_nb | int | Run number (if multiple runs) |
replicate | string | Replicate indicator (R1, R2, etc.) |
sp_site | string | SharePoint site |
file_path | string | File path |
file_url | string | SharePoint URL |
last_update | datetime | Last modification time |
analytical.psa_data
Distribution data table.
| Column | Type | Description |
|---|---|---|
file_id | string | SharePoint file identifier |
diameter | float | Particle diameter in microns (um) |
q | float | Probability density at diameter |
undersize | float | Cumulative % passing |
q_cleaned | float | Cleaned probability density |
undersize_cleaned | float | Cleaned cumulative % |
analytical.psa_statistics
Quantile statistics table containing D-values for each sample.
| Column | Type | Description |
|---|---|---|
sample_id | string | Sample or base sample ID |
base_sample_id | string | Base sample ID |
date | date | Sample date |
replicate | string | Replicate number or "Mean" |
run_nb | string | Run number or "Mean" |
file_id | string | Source file (null for mean) |
D5 - D95 | float | Quantile diameters (um) |
D5_cleaned - D95_cleaned | float | Cleaned quantile diameters |
analytical.psa_mean_data
Averaged distribution data for samples with multiple replicates.
| Column | Type | Description |
|---|---|---|
sample_id | string | Base sample ID |
diameter | float | Particle diameter (um) |
q | float | Averaged probability density |
undersize | float | Averaged cumulative % |
q_cleaned | float | Averaged cleaned probability density |
undersize_cleaned | float | Averaged cleaned cumulative % |
Data Cleaning
Plateau Removal Algorithm
The pipeline includes a data cleaning algorithm that removes erroneous plateau regions at large particle sizes. This addresses instrument artifacts that can occur above ~600 um.
Parameters:
threshold = 0.01- % change below which is considered a plateaumin_length = 3- Minimum consecutive points to identify a plateaumin_diameter = 592- Only check above 600 ummin_undersize = 5- Skip first 5% of distributionmax_undersize = 96- Skip last 4% of distribution
Algorithm:
- Identify plateau regions (low q values) above 600 um
- If plateau extends for 3+ consecutive points, truncate the distribution
- Rescale undersize values to 100%
Raw vs Cleaned Data
- Raw data (
undersize,q): Original instrument values - Cleaned data (
undersize_cleaned,q_cleaned): Plateau regions removed and rescaled - Cleaned statistics (
D50_cleaned, etc.): Calculated from cleaned distribution
Replicate Handling
Base Sample ID
The base_sample_id strips replicate indicators from sample IDs:
P800-001-1-QBND-FM-R1 -> P800-001-1-QBND-FM
P800-001-1-QBND-FM-R2 -> P800-001-1-QBND-FM
For PIL/TLP samples, reactor designations (R###) in the 4th position are preserved:
PIL13-441-121620-R204 -> PIL13-441-121620-R204 (kept as-is)
Mean Calculation
For samples with multiple replicates:
- Group by
base_sample_id - Verify all replicates have the same diameter grid
- Average q and undersize values at each diameter
- Calculate D-values from the averaged distribution
- Store as replicate = "Mean" in
psa_statistics
Usage Examples
Query D50 Values
from shared.db.sql import SQL
# Get mean D50 for all samples
mean_d50 = SQL.read("""
SELECT sample_id, D50_cleaned
FROM analytical.psa_statistics
WHERE replicate = 'Mean'
ORDER BY sample_id
""")
# Get D50 for specific project team
d50_values = SQL.read("""
SELECT sample_id, D50_cleaned
FROM analytical.psa_statistics
WHERE sample_id LIKE 'P800%'
AND replicate = 'Mean'
""")
Get Full Distribution
distribution = SQL.read("""
SELECT d.diameter, d.undersize_cleaned as undersize
FROM analytical.psa_data d
JOIN analytical.psa_samples s ON d.file_id = s.file_id
WHERE s.sample_id = 'P800-001-1-QBND-FM'
ORDER BY d.diameter
""")
Compare Replicates
replicates = SQL.read("""
SELECT sample_id, replicate, run_nb, D50, D50_cleaned
FROM analytical.psa_statistics
WHERE base_sample_id = 'P800-001-1-QBND-FM'
ORDER BY replicate
""")
Join with Other Analytical Data
combined = SQL.read("""
SELECT
p.sample_id,
p.D50_cleaned as D50,
x.CaO, x.MgO, x.SiO2
FROM analytical.psa_statistics p
LEFT JOIN analytical.xrf_simplified x
ON p.sample_id = x.sample_id
WHERE p.replicate = 'Mean'
""")
Visualization
Plot Distribution Curve
import matplotlib.pyplot as plt
from shared.db.sql import SQL
dist = SQL.read("""
SELECT d.diameter, d.undersize_cleaned as undersize
FROM analytical.psa_data d
JOIN analytical.psa_samples s ON d.file_id = s.file_id
WHERE s.sample_id = 'P800-001-1-QBND-FM'
ORDER BY d.diameter
""")
plt.figure(figsize=(10, 6))
plt.semilogx(dist['diameter'], dist['undersize'])
plt.xlabel('Particle Diameter (um)')
plt.ylabel('Cumulative % Passing')
plt.title('Particle Size Distribution')
plt.grid(True, alpha=0.3)
plt.axhline(y=50, color='r', linestyle='--', label='D50')
plt.legend()
plt.show()
Compare Raw vs Cleaned
dist = SQL.read("""
SELECT d.diameter, d.undersize, d.undersize_cleaned
FROM analytical.psa_data d
JOIN analytical.psa_samples s ON d.file_id = s.file_id
WHERE s.sample_id = 'P800-001-1-QBND-FM'
ORDER BY d.diameter
""")
plt.figure(figsize=(10, 6))
plt.semilogx(dist['diameter'], dist['undersize'], label='Raw', alpha=0.7)
plt.semilogx(dist['diameter'], dist['undersize_cleaned'], label='Cleaned')
plt.xlabel('Particle Diameter (um)')
plt.ylabel('Cumulative % Passing')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
SharePoint Mastersheet
The psa_mastersheet asset exports statistics to SharePoint as PSA_Mastersheet.xlsx:
- Sheet 1: Raw D-values
- Sheet 2: Cleaned D-values
- Formatting: Protected worksheet with sorting/filtering enabled
- Columns: Sample ID, Replicate, Run, Date, D5-D95
- Sorting: Most recent date first, then by sample ID
Related Assets
| Asset | Description |
|---|---|
sample_id_crosswalk | Maps messy sample IDs to canonical format |
scm_firstlast_crosswalk | SCM sample ID mappings (first-last format) |
project_team_map | Maps old process area codes to new project team codes |
Troubleshooting
Missing Samples
If a sample is missing from psa_statistics:
- Check if it exists in
analytical.psa_samples - Verify sample ID normalization succeeded (not NULL)
- Check if the source file exists in SharePoint PSA Data site
- Look for parsing errors in
backend.psa_samples_raw
Invalid Distributions
If D-values look wrong:
- Compare raw vs cleaned values - cleaning may have truncated too much
- Check the distribution curve for anomalies
- Verify the original CSV file format
Replicate Mismatch
If mean values are missing:
- Check that replicates have matching diameter grids
- Replicates with different grids cannot be averaged
- Review individual replicates in
psa_statisticswherereplicate != 'Mean'