curve fitting to get overlapping peak areas

| categories: data analysis | View Comments

gc_fitting

Contents

curve fitting to get overlapping peak areas

John Kitchin

Today we examine an approach to fitting curves to overlapping peaks to deconvolute them so we can estimate the area under each curve. You will need to download gc-data-2.txt for this example. This file contains data from a gas chromatograph with two peaks that overlap. We want the area under each peak to estimate the gas composition. You will see how to read the text file in, parse it to get the data for plotting and analysis, and then how to fit it.

function main

read in the data file

The data file is all text, and we have to read it in, and find lines that match certain patterns to identify the regions that are data, then we read in the data lines.

clear all; close all

datafile = 'gc-data-2.txt';

fid = fopen(datafile);

first we get the number of data points, and read up to the data

i = 0;
while 1
    line = fgetl(fid);
    sm = strmatch('# of Points',line);
    if ~isempty(sm)
        regex = '# of Points(.*)';
        [match tokens] = regexp(line,regex,'match','tokens');
        npoints = str2num(tokens{1}{1});
    elseif strcmp(line,'R.Time	Intensity')
        break
    i = i + 1
    end
end

initialize the data vectors

t = zeros(1,npoints);
intensity = zeros(1,npoints);

now read in the data

for j=1:npoints
    line = fgetl(fid);
    [a] = textscan(line,'%f%d');
    t(j) = a{1};
    intensity(j) = a{2};
end

fclose(fid);

Plot the data

plot(t,intensity)
xlabel('Time (s)')
ylabel('Intensity (arb. units)')
xlim([4 6])

correct for non-zero baseline

intensity = intensity + 352;
plot(t,intensity)
xlabel('Time (s)')
ylabel('Intensity (arb. units)')
xlim([4 6])

a fitting function for one peak

The peaks are asymmetric, decaying gaussian functions.

    function f = asym_peak(pars,t)
        % from Anal. Chem. 1994, 66, 1294-1301
        a0 = pars(1);  % peak area
        a1 = pars(2);  % elution time
        a2 = pars(3);  % width of gaussian
        a3 = pars(4);  % exponential damping term
        f = a0/2/a3*exp(a2^2/2/a3^2 + (a1 - t)/a3)...
            .*(erf((t-a1)/(sqrt(2)*a2) - a2/sqrt(2)/a3) + 1);
    end

a fitting function for two peaks

to get two peaks, we simply add two peaks together.

    function f = two_peaks(pars, t)
        a10 = pars(1);  % peak area
        a11 = pars(2);  % elution time
        a12 = pars(3);  % width of gaussian
        a13 = pars(4);  % exponential damping term
        a20 = pars(5);  % peak area
        a21 = pars(6);  % elution time
        a22 = pars(7);  % width of gaussian
        a23 = pars(8);  % exponential damping term

        p1 = asym_peak([a10 a11 a12 a13],t);
        p2 = asym_peak([a20 a21 a22 a23],t);

        f = p1 + p2;
    end

Plot fitting function with an initial guess for each parameter

the fit is not very good.

hold all;
parguess = [1500,4.85,0.05,0.05,5000,5.1,0.05,0.1];
plot (t,two_peaks(parguess,t),'g-')
legend 'raw data' 'initial guess'

nonlinear fitting

now we use nonlinear fitting to get the parameters that best fit our data, and plot the fit on the graph.

pars = nlinfit(t,intensity, @two_peaks, parguess)
plot(t,two_peaks(pars, t),'r-')
legend 'raw data' 'initial guess' 'total fit'
pars =

   1.0e+03 *

    1.3052    0.0049    0.0001    0.0000    5.3162    0.0051    0.0000    0.0001

the fits are not perfect. The small peak is pretty good, but there is an unphysical tail on the larger peak, and a small mismatch at the peak. There is not much to do about that, it means the model peak we are using is not a good model for the peak. We will still integrate the areas though.

now extract out the two peaks and integrate the areas

pars1 = pars(1:4)
pars2 = pars(5:8)

peak1 = asym_peak(pars1, t);
peak2 = asym_peak(pars2, t);

plot(t,peak1)
plot(t,peak2)
legend 'raw data' 'initial guess' 'total fit' 'peak 1' 'peak 2';

area1 = trapz(t, peak1)
area2 = trapz(t, peak2)
pars1 =

   1.0e+03 *

    1.3052    0.0049    0.0001    0.0000


pars2 =

   1.0e+03 *

    5.3162    0.0051    0.0000    0.0001


area1 =

   1.3052e+03


area2 =

   5.3162e+03

Compute relative amounts

percent1 = area1/(area1 + area2)
percent2 = area2/(area1 + area2)
percent1 =

    0.1971


percent2 =

    0.8029

This sample was air, and the first peak is oxygen, and the second peak is nitrogen. we come pretty close to the actual composition of air, although it is low on the oxygen content. To do better, one would have to use a calibration curve.

end

% categories: data analysis

% post_id = 1994; %delete this line to force new post;
% permaLink = http://matlab.cheme.cmu.edu/2012/06/22/curve-fitting-to-get-overlapping-peak-areas/;
blog comments powered by Disqus