Skip to main content

FTIR - Fourier Transform Infrared Spectroscopy

Fourier Transform Infrared (FTIR) Spectroscopy is used to identify molecular structures and chemical bonds through infrared absorption. It provides spectral data showing absorbance as a function of wavenumber.

Method Overview

PropertyValue
Full NameFourier Transform Infrared Spectroscopy
PurposeIdentifies molecular structures and chemical bonds through infrared absorption
OutputAbsorbance vs. wavenumber spectra
UnitsWavenumber (cm⁻¹), Absorbance (arbitrary units)
SharePoint LocationAnalytical Data > FTIR > FTIR-Data
File FormatSPA (Bruker OPUS binary format)

Typical Wavenumber Ranges

Range (cm⁻¹)Functional Groups
4000-2500O-H, N-H, C-H stretching
2500-2000Triple bonds (C≡C, C≡N)
2000-1500Double bonds (C=O, C=C, C=N)
1500-400Fingerprint region (complex vibrations)

Common Peaks in Materials Science

Wavenumber (cm⁻¹)Assignment
~3400O-H stretch (water, hydroxides)
~2900C-H stretch (organics)
~1650H-O-H bending (water)
~1400-1500CO₃²⁻ asymmetric stretch (carbonates)
~870CO₃²⁻ out-of-plane bending
~700-800Metal-O vibrations

Data Pipeline

SharePoint Source

  • Site: Analytical Data
  • Folder: FTIR/FTIR-Data/
  • File Type: SPA (Bruker OPUS binary format)

Dagster Assets

The FTIR data pipeline consists of the following assets in apps/datasmart/src/datasmart/assets/analytical/ftir.py:

ftir_incremental (sharepoint_multi_asset)
├── ftir_data_incremental (backend.ftir_data_incremental)
└── ftir_samples_incremental (backend.ftir_samples_incremental)

ftir (analytical.ftir)

Asset Descriptions

AssetSchemaDescription
ftir_data_incrementalbackendSpectral data (wavenumber, absorbance) per file
ftir_samples_incrementalbackendSample metadata from raw files
ftiranalyticalJoined data with duplicates removed by keeping most recent

Database Tables

backend.ftir_samples_incremental

Sample metadata from SharePoint files.

ColumnTypeDescription
file_idstringSharePoint file identifier
original_sample_idstringSample ID from filename
last_updatedatetimeLast modification time
sp_sitestringSharePoint site
file_pathstringFile path
file_urlstringSharePoint URL

backend.ftir_data_incremental

Spectral data in long format.

ColumnTypeDescription
file_idstringSharePoint file identifier
wavenumberfloatWavenumber in cm⁻¹
absorbancefloatAbsorbance intensity

analytical.ftir

Final joined table with one spectrum per sample (most recent version kept).

ColumnTypeDescription
original_sample_idstringSample ID from filename
last_updatedatetimeLast modification time
wavenumberfloatWavenumber in cm⁻¹
absorbancefloatAbsorbance intensity
sp_sitestringSharePoint site
file_pathstringFile path
file_urlstringSharePoint URL
file_idstringSharePoint file identifier

File Format

SPA Binary Format

The .spa format is a binary format used by Bruker OPUS software. Key parsing details:

  • Title: Bytes 30-255 (null-terminated UTF-8 string)
  • Number of points: Bytes 564-568 (32-bit integer)
  • Wavenumber range: Bytes 576-584 (two 32-bit floats: max, min)
  • Absorbance data position: Follows flag value of 3 in header

Wavenumbers are linearly spaced between min and max values.

Sample ID Extraction

Sample ID is extracted from the filename by removing the .spa extension:

original_sample_id = file.name[:-4]  # Remove .spa

Note: FTIR sample IDs are NOT normalized - uses original_sample_id only.

Duplicate Handling

When multiple versions of a sample exist, the pipeline keeps the most recent:

# Keep most recent version based on last_update
samples = ftir_samples_incremental.sort_values(by=['last_update'], ascending=False)
samples = samples.drop_duplicates(subset=['original_sample_id'], keep='first')

Usage Examples

Query FTIR Data

from shared.db.sql import SQL

# Get FTIR spectrum for a specific sample
spectrum = SQL.read("""
SELECT wavenumber, absorbance
FROM analytical.ftir
WHERE original_sample_id = 'P800-001-1-QBND-FM'
ORDER BY wavenumber
""")

# Get all samples with FTIR data
samples = SQL.read("""
SELECT DISTINCT original_sample_id, last_update
FROM analytical.ftir
ORDER BY last_update DESC
""")

# Filter by project team
ftir_data = SQL.read("""
SELECT * FROM analytical.ftir
WHERE original_sample_id LIKE 'P800%'
""")

Count Data Points per Sample

counts = SQL.read("""
SELECT original_sample_id, COUNT(*) as num_points
FROM analytical.ftir
GROUP BY original_sample_id
ORDER BY num_points DESC
""")

Plotting FTIR Spectra

import matplotlib.pyplot as plt
from shared.db.sql import SQL

# Get spectrum data
sample_id = 'P800-001-1-QBND-FM'
spectrum = SQL.read(f"""
SELECT wavenumber, absorbance
FROM analytical.ftir
WHERE original_sample_id = '{sample_id}'
ORDER BY wavenumber DESC
""")

# Note: FTIR spectra are typically plotted with wavenumber decreasing left-to-right
plt.figure(figsize=(10, 6))
plt.plot(spectrum['wavenumber'], spectrum['absorbance'])
plt.xlabel('Wavenumber (cm⁻¹)')
plt.ylabel('Absorbance')
plt.xlim(4000, 400) # Standard range, high to low
plt.title(f'FTIR Spectrum: {sample_id}')
plt.show()

Compare Multiple Spectra

import matplotlib.pyplot as plt
from shared.db.sql import SQL

sample_ids = ['SAMPLE-001', 'SAMPLE-002', 'SAMPLE-003']

plt.figure(figsize=(12, 6))
for sample_id in sample_ids:
spectrum = SQL.read(f"""
SELECT wavenumber, absorbance
FROM analytical.ftir
WHERE original_sample_id = '{sample_id}'
ORDER BY wavenumber DESC
""")
plt.plot(spectrum['wavenumber'], spectrum['absorbance'], label=sample_id)

plt.xlabel('Wavenumber (cm⁻¹)')
plt.ylabel('Absorbance')
plt.xlim(4000, 400)
plt.legend()
plt.title('FTIR Spectra Comparison')
plt.show()

Troubleshooting

Missing Samples

If a sample is missing from analytical.ftir:

  1. Check if it exists in backend.ftir_samples_incremental
  2. Verify the source file exists in SharePoint
  3. Check if the .spa file could be parsed (errors are logged)

Verify File Processing

# Check incremental tables for a specific file
file_info = SQL.read("""
SELECT * FROM backend.ftir_samples_incremental
WHERE original_sample_id LIKE '%SAMPLE%'
ORDER BY last_update DESC
""")

Check Spectrum Data Quality

# Get summary statistics for a spectrum
stats = SQL.read("""
SELECT
original_sample_id,
COUNT(*) as num_points,
MIN(wavenumber) as min_wavenumber,
MAX(wavenumber) as max_wavenumber,
AVG(absorbance) as mean_absorbance
FROM analytical.ftir
WHERE original_sample_id = 'P800-001-1-QBND-FM'
GROUP BY original_sample_id
""")