Skip to main content

PSA - Particle Size Analysis

Particle Size Analysis (PSA) uses laser diffraction to measure the grain size distribution of powdered materials. It provides cumulative distribution curves and quantile statistics (D-values) that characterize the particle size distribution.

Method Overview

PropertyValue
Full NameParticle Size Analysis (Laser Diffraction)
PurposeMeasures grain size distribution of powdered materials
OutputCumulative distribution curve and quantile statistics
Output UnitsParticle diameter in microns (um)
SharePoint LocationPSA Data
File FormatCSV

Key Metrics

The D-values represent particle diameters at which X% of particles are smaller:

MetricDescription
D55% of particles are smaller
D1010% of particles are smaller
D2020% of particles are smaller
D3030% of particles are smaller
D4040% of particles are smaller
D50Median particle size (50% smaller)
D6060% of particles are smaller
D7070% of particles are smaller
D8080% of particles are smaller
D9090% of particles are smaller
D9595% of particles are smaller

Distribution Data

ColumnDescription
diameterParticle diameter in microns (um)
qProbability density at diameter
undersizeCumulative percentage passing

Data Pipeline

SharePoint Source

  • Site: PSA Data
  • Folder: Root (/)
  • File Type: CSV

File Naming Convention

Files follow the pattern: [timestamp]-[sample_id].csv or [sample_id]-[run_number].csv

Timestamp format: YYYYMMDDHHmm (e.g., 202312151430)

Dagster Assets

The PSA data pipeline consists of the following assets in apps/datasmart/src/datasmart/assets/analytical/psa.py:

psa_raw (sharepoint_multi_asset)
├── psa_samples_raw (backend.psa_samples_raw)
└── psa_data_raw (backend.psa_data_raw)

psa (multi_asset)
├── psa_samples (analytical.psa_samples)
└── psa_data (analytical.psa_data)

psa_mean_data (analytical.psa_mean_data)

psa_statistics (analytical.psa_statistics)

psa_mastersheet (SharePoint Excel export)

Asset Descriptions

AssetSchemaDescription
psa_samples_rawbackendSample metadata from raw files
psa_data_rawbackendRaw distribution data (diameter, q, undersize)
psa_samplesanalyticalCleaned sample metadata with normalized IDs
psa_dataanalyticalDistribution data with cleaned columns
psa_mean_dataanalyticalAveraged distributions for replicates
psa_statisticsanalyticalD-values for each sample and mean
psa_mastersheetSharePointExcel export of statistics

Database Tables

analytical.psa_samples

Sample metadata table.

ColumnTypeDescription
file_idstringSharePoint file identifier
original_sample_idstringRaw sample ID from filename
sample_idstringNormalized sample ID
base_sample_idstringSample ID without replicate suffix
sample_datedatetimeTimestamp from filename
run_nbintRun number (if multiple runs)
replicatestringReplicate indicator (R1, R2, etc.)
sp_sitestringSharePoint site
file_pathstringFile path
file_urlstringSharePoint URL
last_updatedatetimeLast modification time

analytical.psa_data

Distribution data table.

ColumnTypeDescription
file_idstringSharePoint file identifier
diameterfloatParticle diameter in microns (um)
qfloatProbability density at diameter
undersizefloatCumulative % passing
q_cleanedfloatCleaned probability density
undersize_cleanedfloatCleaned cumulative %

analytical.psa_statistics

Quantile statistics table containing D-values for each sample.

ColumnTypeDescription
sample_idstringSample or base sample ID
base_sample_idstringBase sample ID
datedateSample date
replicatestringReplicate number or "Mean"
run_nbstringRun number or "Mean"
file_idstringSource file (null for mean)
D5 - D95floatQuantile diameters (um)
D5_cleaned - D95_cleanedfloatCleaned quantile diameters

analytical.psa_mean_data

Averaged distribution data for samples with multiple replicates.

ColumnTypeDescription
sample_idstringBase sample ID
diameterfloatParticle diameter (um)
qfloatAveraged probability density
undersizefloatAveraged cumulative %
q_cleanedfloatAveraged cleaned probability density
undersize_cleanedfloatAveraged cleaned cumulative %

Data Cleaning

Plateau Removal Algorithm

The pipeline includes a data cleaning algorithm that removes erroneous plateau regions at large particle sizes. This addresses instrument artifacts that can occur above ~600 um.

Parameters:

  • threshold = 0.01 - % change below which is considered a plateau
  • min_length = 3 - Minimum consecutive points to identify a plateau
  • min_diameter = 592 - Only check above 600 um
  • min_undersize = 5 - Skip first 5% of distribution
  • max_undersize = 96 - Skip last 4% of distribution

Algorithm:

  1. Identify plateau regions (low q values) above 600 um
  2. If plateau extends for 3+ consecutive points, truncate the distribution
  3. Rescale undersize values to 100%

Raw vs Cleaned Data

  • Raw data (undersize, q): Original instrument values
  • Cleaned data (undersize_cleaned, q_cleaned): Plateau regions removed and rescaled
  • Cleaned statistics (D50_cleaned, etc.): Calculated from cleaned distribution

Replicate Handling

Base Sample ID

The base_sample_id strips replicate indicators from sample IDs:

P800-001-1-QBND-FM-R1 -> P800-001-1-QBND-FM
P800-001-1-QBND-FM-R2 -> P800-001-1-QBND-FM

For PIL/TLP samples, reactor designations (R###) in the 4th position are preserved:

PIL13-441-121620-R204 -> PIL13-441-121620-R204 (kept as-is)

Mean Calculation

For samples with multiple replicates:

  1. Group by base_sample_id
  2. Verify all replicates have the same diameter grid
  3. Average q and undersize values at each diameter
  4. Calculate D-values from the averaged distribution
  5. Store as replicate = "Mean" in psa_statistics

Usage Examples

Query D50 Values

from shared.db.sql import SQL

# Get mean D50 for all samples
mean_d50 = SQL.read("""
SELECT sample_id, D50_cleaned
FROM analytical.psa_statistics
WHERE replicate = 'Mean'
ORDER BY sample_id
""")

# Get D50 for specific project team
d50_values = SQL.read("""
SELECT sample_id, D50_cleaned
FROM analytical.psa_statistics
WHERE sample_id LIKE 'P800%'
AND replicate = 'Mean'
""")

Get Full Distribution

distribution = SQL.read("""
SELECT d.diameter, d.undersize_cleaned as undersize
FROM analytical.psa_data d
JOIN analytical.psa_samples s ON d.file_id = s.file_id
WHERE s.sample_id = 'P800-001-1-QBND-FM'
ORDER BY d.diameter
""")

Compare Replicates

replicates = SQL.read("""
SELECT sample_id, replicate, run_nb, D50, D50_cleaned
FROM analytical.psa_statistics
WHERE base_sample_id = 'P800-001-1-QBND-FM'
ORDER BY replicate
""")

Join with Other Analytical Data

combined = SQL.read("""
SELECT
p.sample_id,
p.D50_cleaned as D50,
x.CaO, x.MgO, x.SiO2
FROM analytical.psa_statistics p
LEFT JOIN analytical.xrf_simplified x
ON p.sample_id = x.sample_id
WHERE p.replicate = 'Mean'
""")

Visualization

Plot Distribution Curve

import matplotlib.pyplot as plt
from shared.db.sql import SQL

dist = SQL.read("""
SELECT d.diameter, d.undersize_cleaned as undersize
FROM analytical.psa_data d
JOIN analytical.psa_samples s ON d.file_id = s.file_id
WHERE s.sample_id = 'P800-001-1-QBND-FM'
ORDER BY d.diameter
""")

plt.figure(figsize=(10, 6))
plt.semilogx(dist['diameter'], dist['undersize'])
plt.xlabel('Particle Diameter (um)')
plt.ylabel('Cumulative % Passing')
plt.title('Particle Size Distribution')
plt.grid(True, alpha=0.3)
plt.axhline(y=50, color='r', linestyle='--', label='D50')
plt.legend()
plt.show()

Compare Raw vs Cleaned

dist = SQL.read("""
SELECT d.diameter, d.undersize, d.undersize_cleaned
FROM analytical.psa_data d
JOIN analytical.psa_samples s ON d.file_id = s.file_id
WHERE s.sample_id = 'P800-001-1-QBND-FM'
ORDER BY d.diameter
""")

plt.figure(figsize=(10, 6))
plt.semilogx(dist['diameter'], dist['undersize'], label='Raw', alpha=0.7)
plt.semilogx(dist['diameter'], dist['undersize_cleaned'], label='Cleaned')
plt.xlabel('Particle Diameter (um)')
plt.ylabel('Cumulative % Passing')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

SharePoint Mastersheet

The psa_mastersheet asset exports statistics to SharePoint as PSA_Mastersheet.xlsx:

  • Sheet 1: Raw D-values
  • Sheet 2: Cleaned D-values
  • Formatting: Protected worksheet with sorting/filtering enabled
  • Columns: Sample ID, Replicate, Run, Date, D5-D95
  • Sorting: Most recent date first, then by sample ID
AssetDescription
sample_id_crosswalkMaps messy sample IDs to canonical format
scm_firstlast_crosswalkSCM sample ID mappings (first-last format)
project_team_mapMaps old process area codes to new project team codes

Troubleshooting

Missing Samples

If a sample is missing from psa_statistics:

  1. Check if it exists in analytical.psa_samples
  2. Verify sample ID normalization succeeded (not NULL)
  3. Check if the source file exists in SharePoint PSA Data site
  4. Look for parsing errors in backend.psa_samples_raw

Invalid Distributions

If D-values look wrong:

  1. Compare raw vs cleaned values - cleaning may have truncated too much
  2. Check the distribution curve for anomalies
  3. Verify the original CSV file format

Replicate Mismatch

If mean values are missing:

  1. Check that replicates have matching diameter grids
  2. Replicates with different grids cannot be averaged
  3. Review individual replicates in psa_statistics where replicate != 'Mean'