FTIR - Fourier Transform Infrared Spectroscopy
Fourier Transform Infrared (FTIR) Spectroscopy is used to identify molecular structures and chemical bonds through infrared absorption. It provides spectral data showing absorbance as a function of wavenumber.
Method Overview
| Property | Value |
|---|---|
| Full Name | Fourier Transform Infrared Spectroscopy |
| Purpose | Identifies molecular structures and chemical bonds through infrared absorption |
| Output | Absorbance vs. wavenumber spectra |
| Units | Wavenumber (cm⁻¹), Absorbance (arbitrary units) |
| SharePoint Location | Analytical Data > FTIR > FTIR-Data |
| File Format | SPA (Bruker OPUS binary format) |
Typical Wavenumber Ranges
| Range (cm⁻¹) | Functional Groups |
|---|---|
| 4000-2500 | O-H, N-H, C-H stretching |
| 2500-2000 | Triple bonds (C≡C, C≡N) |
| 2000-1500 | Double bonds (C=O, C=C, C=N) |
| 1500-400 | Fingerprint region (complex vibrations) |
Common Peaks in Materials Science
| Wavenumber (cm⁻¹) | Assignment |
|---|---|
| ~3400 | O-H stretch (water, hydroxides) |
| ~2900 | C-H stretch (organics) |
| ~1650 | H-O-H bending (water) |
| ~1400-1500 | CO₃²⁻ asymmetric stretch (carbonates) |
| ~870 | CO₃²⁻ out-of-plane bending |
| ~700-800 | Metal-O vibrations |
Data Pipeline
SharePoint Source
- Site: Analytical Data
- Folder: FTIR/FTIR-Data/
- File Type: SPA (Bruker OPUS binary format)
Dagster Assets
The FTIR data pipeline consists of the following assets in apps/datasmart/src/datasmart/assets/analytical/ftir.py:
ftir_incremental (sharepoint_multi_asset)
├── ftir_data_incremental (backend.ftir_data_incremental)
└── ftir_samples_incremental (backend.ftir_samples_incremental)
↓
ftir (analytical.ftir)
Asset Descriptions
| Asset | Schema | Description |
|---|---|---|
ftir_data_incremental | backend | Spectral data (wavenumber, absorbance) per file |
ftir_samples_incremental | backend | Sample metadata from raw files |
ftir | analytical | Joined data with duplicates removed by keeping most recent |
Database Tables
backend.ftir_samples_incremental
Sample metadata from SharePoint files.
| Column | Type | Description |
|---|---|---|
file_id | string | SharePoint file identifier |
original_sample_id | string | Sample ID from filename |
last_update | datetime | Last modification time |
sp_site | string | SharePoint site |
file_path | string | File path |
file_url | string | SharePoint URL |
backend.ftir_data_incremental
Spectral data in long format.
| Column | Type | Description |
|---|---|---|
file_id | string | SharePoint file identifier |
wavenumber | float | Wavenumber in cm⁻¹ |
absorbance | float | Absorbance intensity |
analytical.ftir
Final joined table with one spectrum per sample (most recent version kept).
| Column | Type | Description |
|---|---|---|
original_sample_id | string | Sample ID from filename |
last_update | datetime | Last modification time |
wavenumber | float | Wavenumber in cm⁻¹ |
absorbance | float | Absorbance intensity |
sp_site | string | SharePoint site |
file_path | string | File path |
file_url | string | SharePoint URL |
file_id | string | SharePoint file identifier |
File Format
SPA Binary Format
The .spa format is a binary format used by Bruker OPUS software. Key parsing details:
- Title: Bytes 30-255 (null-terminated UTF-8 string)
- Number of points: Bytes 564-568 (32-bit integer)
- Wavenumber range: Bytes 576-584 (two 32-bit floats: max, min)
- Absorbance data position: Follows flag value of 3 in header
Wavenumbers are linearly spaced between min and max values.
Sample ID Extraction
Sample ID is extracted from the filename by removing the .spa extension:
original_sample_id = file.name[:-4] # Remove .spa
Note: FTIR sample IDs are NOT normalized - uses original_sample_id only.
Duplicate Handling
When multiple versions of a sample exist, the pipeline keeps the most recent:
# Keep most recent version based on last_update
samples = ftir_samples_incremental.sort_values(by=['last_update'], ascending=False)
samples = samples.drop_duplicates(subset=['original_sample_id'], keep='first')
Usage Examples
Query FTIR Data
from shared.db.sql import SQL
# Get FTIR spectrum for a specific sample
spectrum = SQL.read("""
SELECT wavenumber, absorbance
FROM analytical.ftir
WHERE original_sample_id = 'P800-001-1-QBND-FM'
ORDER BY wavenumber
""")
# Get all samples with FTIR data
samples = SQL.read("""
SELECT DISTINCT original_sample_id, last_update
FROM analytical.ftir
ORDER BY last_update DESC
""")
# Filter by project team
ftir_data = SQL.read("""
SELECT * FROM analytical.ftir
WHERE original_sample_id LIKE 'P800%'
""")
Count Data Points per Sample
counts = SQL.read("""
SELECT original_sample_id, COUNT(*) as num_points
FROM analytical.ftir
GROUP BY original_sample_id
ORDER BY num_points DESC
""")
Plotting FTIR Spectra
import matplotlib.pyplot as plt
from shared.db.sql import SQL
# Get spectrum data
sample_id = 'P800-001-1-QBND-FM'
spectrum = SQL.read(f"""
SELECT wavenumber, absorbance
FROM analytical.ftir
WHERE original_sample_id = '{sample_id}'
ORDER BY wavenumber DESC
""")
# Note: FTIR spectra are typically plotted with wavenumber decreasing left-to-right
plt.figure(figsize=(10, 6))
plt.plot(spectrum['wavenumber'], spectrum['absorbance'])
plt.xlabel('Wavenumber (cm⁻¹)')
plt.ylabel('Absorbance')
plt.xlim(4000, 400) # Standard range, high to low
plt.title(f'FTIR Spectrum: {sample_id}')
plt.show()
Compare Multiple Spectra
import matplotlib.pyplot as plt
from shared.db.sql import SQL
sample_ids = ['SAMPLE-001', 'SAMPLE-002', 'SAMPLE-003']
plt.figure(figsize=(12, 6))
for sample_id in sample_ids:
spectrum = SQL.read(f"""
SELECT wavenumber, absorbance
FROM analytical.ftir
WHERE original_sample_id = '{sample_id}'
ORDER BY wavenumber DESC
""")
plt.plot(spectrum['wavenumber'], spectrum['absorbance'], label=sample_id)
plt.xlabel('Wavenumber (cm⁻¹)')
plt.ylabel('Absorbance')
plt.xlim(4000, 400)
plt.legend()
plt.title('FTIR Spectra Comparison')
plt.show()
Troubleshooting
Missing Samples
If a sample is missing from analytical.ftir:
- Check if it exists in
backend.ftir_samples_incremental - Verify the source file exists in SharePoint
- Check if the
.spafile could be parsed (errors are logged)
Verify File Processing
# Check incremental tables for a specific file
file_info = SQL.read("""
SELECT * FROM backend.ftir_samples_incremental
WHERE original_sample_id LIKE '%SAMPLE%'
ORDER BY last_update DESC
""")
Check Spectrum Data Quality
# Get summary statistics for a spectrum
stats = SQL.read("""
SELECT
original_sample_id,
COUNT(*) as num_points,
MIN(wavenumber) as min_wavenumber,
MAX(wavenumber) as max_wavenumber,
AVG(absorbance) as mean_absorbance
FROM analytical.ftir
WHERE original_sample_id = 'P800-001-1-QBND-FM'
GROUP BY original_sample_id
""")