FTIR - Fourier Transform Infrared Spectroscopy

Fourier Transform Infrared (FTIR) Spectroscopy is used to identify molecular structures and chemical bonds through infrared absorption. It provides spectral data showing absorbance as a function of wavenumber.

Method Overview

Property	Value
Full Name	Fourier Transform Infrared Spectroscopy
Purpose	Identifies molecular structures and chemical bonds through infrared absorption
Output	Absorbance vs. wavenumber spectra
Units	Wavenumber (cm⁻¹), Absorbance (arbitrary units)
SharePoint Location	Analytical Data > FTIR > FTIR-Data
File Format	SPA (Bruker OPUS binary format)

Typical Wavenumber Ranges

Range (cm⁻¹)	Functional Groups
4000-2500	O-H, N-H, C-H stretching
2500-2000	Triple bonds (C≡C, C≡N)
2000-1500	Double bonds (C=O, C=C, C=N)
1500-400	Fingerprint region (complex vibrations)

Common Peaks in Materials Science

Wavenumber (cm⁻¹)	Assignment
~3400	O-H stretch (water, hydroxides)
~2900	C-H stretch (organics)
~1650	H-O-H bending (water)
~1400-1500	CO₃²⁻ asymmetric stretch (carbonates)
~870	CO₃²⁻ out-of-plane bending
~700-800	Metal-O vibrations

Data Pipeline

SharePoint Source

Site: Analytical Data
Folder: FTIR/FTIR-Data/
File Type: SPA (Bruker OPUS binary format)

Dagster Assets

The FTIR data pipeline consists of the following assets in apps/datasmart/src/datasmart/assets/analytical/ftir.py:

ftir_incremental (sharepoint_multi_asset)
├── ftir_data_incremental (backend.ftir_data_incremental)
└── ftir_samples_incremental (backend.ftir_samples_incremental)
          ↓
    ftir (analytical.ftir)

Asset Descriptions

Asset	Schema	Description
`ftir_data_incremental`	backend	Spectral data (wavenumber, absorbance) per file
`ftir_samples_incremental`	backend	Sample metadata from raw files
`ftir`	analytical	Joined data with duplicates removed by keeping most recent

Database Tables

backend.ftir_samples_incremental

Sample metadata from SharePoint files.

Column	Type	Description
`file_id`	string	SharePoint file identifier
`original_sample_id`	string	Sample ID from filename
`last_update`	datetime	Last modification time
`sp_site`	string	SharePoint site
`file_path`	string	File path
`file_url`	string	SharePoint URL

backend.ftir_data_incremental

Spectral data in long format.

Column	Type	Description
`file_id`	string	SharePoint file identifier
`wavenumber`	float	Wavenumber in cm⁻¹
`absorbance`	float	Absorbance intensity

analytical.ftir

Final joined table with one spectrum per sample (most recent version kept).

Column	Type	Description
`original_sample_id`	string	Sample ID from filename
`last_update`	datetime	Last modification time
`wavenumber`	float	Wavenumber in cm⁻¹
`absorbance`	float	Absorbance intensity
`sp_site`	string	SharePoint site
`file_path`	string	File path
`file_url`	string	SharePoint URL
`file_id`	string	SharePoint file identifier

File Format

SPA Binary Format

The .spa format is a binary format used by Bruker OPUS software. Key parsing details:

Title: Bytes 30-255 (null-terminated UTF-8 string)
Number of points: Bytes 564-568 (32-bit integer)
Wavenumber range: Bytes 576-584 (two 32-bit floats: max, min)
Absorbance data position: Follows flag value of 3 in header

Wavenumbers are linearly spaced between min and max values.

Sample ID Extraction

Sample ID is extracted from the filename by removing the .spa extension:

original_sample_id = file.name[:-4]  # Remove .spa

Note: FTIR sample IDs are NOT normalized - uses original_sample_id only.

Duplicate Handling

When multiple versions of a sample exist, the pipeline keeps the most recent:

# Keep most recent version based on last_update
samples = ftir_samples_incremental.sort_values(by=['last_update'], ascending=False)
samples = samples.drop_duplicates(subset=['original_sample_id'], keep='first')

Usage Examples

Query FTIR Data

from shared.db.sql import SQL

# Get FTIR spectrum for a specific sample
spectrum = SQL.read("""
    SELECT wavenumber, absorbance
    FROM analytical.ftir
    WHERE original_sample_id = 'P800-001-1-QBND-FM'
    ORDER BY wavenumber
""")

# Get all samples with FTIR data
samples = SQL.read("""
    SELECT DISTINCT original_sample_id, last_update
    FROM analytical.ftir
    ORDER BY last_update DESC
""")

# Filter by project team
ftir_data = SQL.read("""
    SELECT * FROM analytical.ftir
    WHERE original_sample_id LIKE 'P800%'
""")

Count Data Points per Sample

counts = SQL.read("""
    SELECT original_sample_id, COUNT(*) as num_points
    FROM analytical.ftir
    GROUP BY original_sample_id
    ORDER BY num_points DESC
""")

Plotting FTIR Spectra

import matplotlib.pyplot as plt
from shared.db.sql import SQL

# Get spectrum data
sample_id = 'P800-001-1-QBND-FM'
spectrum = SQL.read(f"""
    SELECT wavenumber, absorbance
    FROM analytical.ftir
    WHERE original_sample_id = '{sample_id}'
    ORDER BY wavenumber DESC
""")

# Note: FTIR spectra are typically plotted with wavenumber decreasing left-to-right
plt.figure(figsize=(10, 6))
plt.plot(spectrum['wavenumber'], spectrum['absorbance'])
plt.xlabel('Wavenumber (cm⁻¹)')
plt.ylabel('Absorbance')
plt.xlim(4000, 400)  # Standard range, high to low
plt.title(f'FTIR Spectrum: {sample_id}')
plt.show()

Compare Multiple Spectra

import matplotlib.pyplot as plt
from shared.db.sql import SQL

sample_ids = ['SAMPLE-001', 'SAMPLE-002', 'SAMPLE-003']

plt.figure(figsize=(12, 6))
for sample_id in sample_ids:
    spectrum = SQL.read(f"""
        SELECT wavenumber, absorbance
        FROM analytical.ftir
        WHERE original_sample_id = '{sample_id}'
        ORDER BY wavenumber DESC
    """)
    plt.plot(spectrum['wavenumber'], spectrum['absorbance'], label=sample_id)

plt.xlabel('Wavenumber (cm⁻¹)')
plt.ylabel('Absorbance')
plt.xlim(4000, 400)
plt.legend()
plt.title('FTIR Spectra Comparison')
plt.show()

Troubleshooting

Missing Samples

If a sample is missing from analytical.ftir:

Check if it exists in backend.ftir_samples_incremental
Verify the source file exists in SharePoint
Check if the .spa file could be parsed (errors are logged)

Verify File Processing

# Check incremental tables for a specific file
file_info = SQL.read("""
    SELECT * FROM backend.ftir_samples_incremental
    WHERE original_sample_id LIKE '%SAMPLE%'
    ORDER BY last_update DESC
""")

Check Spectrum Data Quality

# Get summary statistics for a spectrum
stats = SQL.read("""
    SELECT
        original_sample_id,
        COUNT(*) as num_points,
        MIN(wavenumber) as min_wavenumber,
        MAX(wavenumber) as max_wavenumber,
        AVG(absorbance) as mean_absorbance
    FROM analytical.ftir
    WHERE original_sample_id = 'P800-001-1-QBND-FM'
    GROUP BY original_sample_id
""")

Method Overview​

Typical Wavenumber Ranges​

Common Peaks in Materials Science​

Data Pipeline​

SharePoint Source​

Dagster Assets​

Asset Descriptions​

Database Tables​

backend.ftir_samples_incremental​

backend.ftir_data_incremental​

analytical.ftir​

File Format​

SPA Binary Format​

Sample ID Extraction​

Duplicate Handling​

Usage Examples​

Query FTIR Data​

Count Data Points per Sample​

Plotting FTIR Spectra​

Compare Multiple Spectra​

Troubleshooting​

Missing Samples​

Verify File Processing​

Check Spectrum Data Quality​