## Peak finding in Raman spectroscopy

| categories: data analysis | View Comments

spline_fit_peak

## Peak finding in Raman spectroscopy

John Kitchin

Raman spectroscopy is a vibrational spectroscopy. The data typically comes as intensity vs. wavenumber, and it is discrete. Sometimes it is necessary to identify the precise location of a peak. In this post, we will use spline smoothing to construct an interpolating function of the data, and then use fminbnd to identify peak positions.

clear all; close all


## Read and plot the raw data

[w,i] = textread('raman.txt');

plot(w,i)
xlabel('Raman shift (cm^{-1})')
ylabel('Intensity (counts)')


## Narrow to the region of interest

We will focus on the peaks between 1340 and 1360 cm^-1.

ind = w > 1340 & w < 1360;
w1 = w(ind);
i1 = i(ind);
figure(2)
plot(w1, i1,'b. ')
xlabel('Raman shift (cm^{-1})')
ylabel('Intensity (counts)')


## Generate the spline

The last parameter in csaps determines the amount of smoothing. Closer to 1.0 is less smoothing.

pp = csaps(w1, i1,.99)
hold all
plot(w1, ppval(pp,w1))

pp =

form: 'pp'
breaks: [1x51 double]
coefs: [50x4 double]
pieces: 50
order: 4
dim: 1



## Find all the zeros in the first derivative

We will brute force this by looking for sign changes in the derivative vector evaluated at all the points of the original data

d1 = fnder(pp, 1); % first derivative
s = ppval(d1, w1);
s(s >= 0) = 1;
s(s < 0) = 0;

% where diff = -1 indicates a transition from positive slope to
% negative slope, which is a maximum. These are approximate locations
% of the zeros
z = find(diff(s) == -1);


## Make an interpolating function to find the minima

to use fminbnd we need a function. we make a function handle that is the negative of the main function, so that the peak locations are located at minima. interpolating function handle

f = @(xx) -ppval(pp,xx);

% now loop through each value found in z to find the minimum over the
% interval around each zero.
x = zeros(size(z));
for i=1:length(x)
lowerbound = w1(z(i)-5);
upperbound = w1(z(i)+5);
x(i) = fminbnd(f,lowerbound,upperbound);
end


## Plot the derivatives and peaks found

figure
hold all
plot(w1, ppval(d1,w1))
plot(w1(z), ppval(d1, x),'ro ')

xlabel('Raman shift (cm^{-1})')
legend('1^{st} derivative', 'zeros')


## Label the peaks in the original data

figure(2)
plot(x, ppval(pp,x), 'ro','markerfacecolor','r')

% print some offset labels
for i=1:length(x)
sprintf('%1.1f cm^{-1}',x(i))
end

ans =

1346.5 cm^{-1}

ans =

1348.1 cm^{-1}



## Discussion

In the end, we have illustrated how to construct a spline smoothing interpolation function and to find maxima in the function, including generating some initial guesses. There is more art to this than you might like, since you have to judge how much smoothing is enough or too much. With too much, you may smooth peaks out. With too little, noise may be mistaken for peaks.

## done

% categories: data analysis
% post_id = 2007; %delete this line to force new post;


## curve fitting to get overlapping peak areas

| categories: data analysis | View Comments

gc_fitting

## curve fitting to get overlapping peak areas

John Kitchin

Today we examine an approach to fitting curves to overlapping peaks to deconvolute them so we can estimate the area under each curve. You will need to download gc-data-2.txt for this example. This file contains data from a gas chromatograph with two peaks that overlap. We want the area under each peak to estimate the gas composition. You will see how to read the text file in, parse it to get the data for plotting and analysis, and then how to fit it.

function main


## read in the data file

The data file is all text, and we have to read it in, and find lines that match certain patterns to identify the regions that are data, then we read in the data lines.

clear all; close all

datafile = 'gc-data-2.txt';

fid = fopen(datafile);


## first we get the number of data points, and read up to the data

i = 0;
while 1
line = fgetl(fid);
sm = strmatch('# of Points',line);
if ~isempty(sm)
regex = '# of Points(.*)';
[match tokens] = regexp(line,regex,'match','tokens');
npoints = str2num(tokens{1}{1});
elseif strcmp(line,'R.Time	Intensity')
break
i = i + 1
end
end


## initialize the data vectors

t = zeros(1,npoints);
intensity = zeros(1,npoints);


## now read in the data

for j=1:npoints
line = fgetl(fid);
[a] = textscan(line,'%f%d');
t(j) = a{1};
intensity(j) = a{2};
end

fclose(fid);


## Plot the data

plot(t,intensity)
xlabel('Time (s)')
ylabel('Intensity (arb. units)')
xlim([4 6])


## correct for non-zero baseline

intensity = intensity + 352;
plot(t,intensity)
xlabel('Time (s)')
ylabel('Intensity (arb. units)')
xlim([4 6])


## a fitting function for one peak

The peaks are asymmetric, decaying gaussian functions.

    function f = asym_peak(pars,t)
% from Anal. Chem. 1994, 66, 1294-1301
a0 = pars(1);  % peak area
a1 = pars(2);  % elution time
a2 = pars(3);  % width of gaussian
a3 = pars(4);  % exponential damping term
f = a0/2/a3*exp(a2^2/2/a3^2 + (a1 - t)/a3)...
.*(erf((t-a1)/(sqrt(2)*a2) - a2/sqrt(2)/a3) + 1);
end


## a fitting function for two peaks

to get two peaks, we simply add two peaks together.

    function f = two_peaks(pars, t)
a10 = pars(1);  % peak area
a11 = pars(2);  % elution time
a12 = pars(3);  % width of gaussian
a13 = pars(4);  % exponential damping term
a20 = pars(5);  % peak area
a21 = pars(6);  % elution time
a22 = pars(7);  % width of gaussian
a23 = pars(8);  % exponential damping term

p1 = asym_peak([a10 a11 a12 a13],t);
p2 = asym_peak([a20 a21 a22 a23],t);

f = p1 + p2;
end


## Plot fitting function with an initial guess for each parameter

the fit is not very good.

hold all;
parguess = [1500,4.85,0.05,0.05,5000,5.1,0.05,0.1];
plot (t,two_peaks(parguess,t),'g-')
legend 'raw data' 'initial guess'


## nonlinear fitting

now we use nonlinear fitting to get the parameters that best fit our data, and plot the fit on the graph.

pars = nlinfit(t,intensity, @two_peaks, parguess)
plot(t,two_peaks(pars, t),'r-')
legend 'raw data' 'initial guess' 'total fit'

pars =

1.0e+03 *

1.3052    0.0049    0.0001    0.0000    5.3162    0.0051    0.0000    0.0001



the fits are not perfect. The small peak is pretty good, but there is an unphysical tail on the larger peak, and a small mismatch at the peak. There is not much to do about that, it means the model peak we are using is not a good model for the peak. We will still integrate the areas though.

## now extract out the two peaks and integrate the areas

pars1 = pars(1:4)
pars2 = pars(5:8)

peak1 = asym_peak(pars1, t);
peak2 = asym_peak(pars2, t);

plot(t,peak1)
plot(t,peak2)
legend 'raw data' 'initial guess' 'total fit' 'peak 1' 'peak 2';

area1 = trapz(t, peak1)
area2 = trapz(t, peak2)

pars1 =

1.0e+03 *

1.3052    0.0049    0.0001    0.0000

pars2 =

1.0e+03 *

5.3162    0.0051    0.0000    0.0001

area1 =

1.3052e+03

area2 =

5.3162e+03



## Compute relative amounts

percent1 = area1/(area1 + area2)
percent2 = area2/(area1 + area2)

percent1 =

0.1971

percent2 =

0.8029



This sample was air, and the first peak is oxygen, and the second peak is nitrogen. we come pretty close to the actual composition of air, although it is low on the oxygen content. To do better, one would have to use a calibration curve.

end

% categories: data analysis

% post_id = 1994; %delete this line to force new post;


plotties4

## Colors, 3D Plotting, and Data Manipulation

Guest authors: Harrison Rose and Breanna Stillo

In this post, Harrison and Breanna present three-dimensional experimental data, and show how to plot the data, fit curves through the data, and plot surfaces. You will need to download mycmap.mat

function main

close all
clear all
clc


In order to generate plots with a consistent and hopefully attractive color scheme, we will generate these color presets. Matlab normalizes colors to [R G B] arrays with values between 0 and 1. Since it is easier to find RGB values from 0 to 255, however, we will simply normalize them ourselves. A good colorpicker is http://www.colorpicker.com.

pcol = [255,0,0]/255; % red
lcol = [135,14,179]/255; % a pinkish color


## Raw Data

This is raw data from an experiment for three trials (a and b and c). The X and Y are independent variables, and we measured Z.

X_a = [8.3 8.3 8.3 8.3 8.3 8.3 8.3];
X_b = [11 11 11 11 11 11 11];
X_c = [14 14 14 14 14 14 14];

Y_a = [34 59 64 39 35 36 49];
Y_b = [39 32 27 61 52 57 65];
Y_c = [63 33 38 50 54 68 22];

Z_a = [-3.59833 7.62 0 4.233333333 -2.54 -0.635 7.209];
Z_b = [16.51 10.16 6.77 5.08 15.24 13.7 3.048];
Z_c = [36 20 28 37 40 32 10];


## Plotting the raw data for all three trials:

As you can see, the raw data does not look like very much, and it is pretty hard to interperet what it could mean.

We do see, however, that since X_a is all of one value, X_b is all of another value, and X_c is all of a third, that the data lies entirely on three separate planes.

figure
hold on     % Use this to plot multiple series on a single figure
plot3(X_a,Y_a,Z_a,'.','Color',pcol)
plot3(X_b,Y_b,Z_b,'.','Color',pcol)
plot3(X_c,Y_c,Z_c,'.','Color',pcol)
hold off    % Use this to make sure the next plot command will not
% try to plot on this figure as well.
title('Raw Experimental Data for Trials A, B, and C')
xlabel('x Data')
ylabel('y Data')
zlabel('z Data')
grid on     % 3D data is easier to visualize with the grid. Normally
% the grid defaults to 'on' but using the 'hold on'
% command as we did above causes the grid to default to
% 'off'


## A note on the view:

The command

 view(Az,El)

lets you view a 3D plot from the same viewpoint each time you run the code. To determine the best viewpoint to use, use the click the 'Rotate 3D' icon in the figure toolbar (it looks like a box with a counterclockwise arrow around it), and drag your plot around to view it from different angles. You will notice the text "Az: ## El: ##" appear in the lower left corner of the figure window. This stands for Azimuth and Elevation which represent a point in spherical coordinates from which to view the plot (the radius is fixed by the axes sizes). The command used here will always display the plot from azimuth = -39, and elevation = 10.

view(-39,10)


## A closer look at the raw data:

figure
hold on
plot(Y_a,Z_a,'o','MarkerFaceColor',pcol,'MarkerEdgeColor','none')
title('Raw Data for Trial A, x = 8.3')
xlabel('y')
ylabel('z')
hold off

figure
hold on
plot(Y_b,Z_b,'o','MarkerFaceColor',pcol,'MarkerEdgeColor','none')
title('Raw Data for Trial B, x = 11')
xlabel('y')
ylabel('z')
hold off

figure
hold on
plot(Y_c,Z_c,'o','MarkerFaceColor',pcol,'MarkerEdgeColor','none')
title('Raw Data for Trial C, x = 14')
xlabel('y')
ylabel('z')
hold off


## Fitting the raw data:

In this case, we expect the data to fit the shape of a binomial distribution, so we use the following fit function with three parameters:

    function v = mygauss(par, t)
A  = par(1);
mu = par(2);
s  = par(3);

v=A*exp(-(t-mu).^2./(2*s.^2));
end


## Fitting the data

res = 20;
Yfit=linspace(20,70,res);

% Dataset A
guesses=[20, 40, 20];
[pars residuals J]=nlinfit(Y_a, Z_a-min(Z_a), @mygauss, guesses);

A1=pars(1);
mu1=pars(2);
s1=pars(3);

Zfitfun_a=@(y) A1*exp(-(y-mu1).^2./(2*s1.^2))+min(Z_a);
% Note: We have to shift the dataset up to zero because a gaussian
% typically cannot go below the horizontal axis

% Generate a fit-line through the data

Zfit_a=Zfitfun_a(Yfit);

% Dataset B
guesses=[10, 25, 20];
[pars residuals J]=nlinfit(Y_b, Z_b, @mygauss, guesses);

A2=pars(1);
mu2=pars(2);
s2=pars(3);

Zfitfun_b=@(y) A2*exp(-(y-mu2).^2./(2*s2.^2));

% Generate a fit-line through the data
Zfit_b=Zfitfun_b(Yfit);

% Dataset c
guesses=[20, 60, 20];
[pars residuals J]=nlinfit(Y_c, Z_c, @mygauss, guesses);

A3=pars(1);
mu3=pars(2);
s3=pars(3);

Zfitfun_c=@(y) A3*exp(-(y-mu3).^2./(2*s3.^2));

% Generate a fit-line through the data
Zfit_c=Zfitfun_c(Yfit);


## Plotting the fitted data:

% Trial A
figure
hold on
plot(Y_a,Z_a,'o','MarkerFaceColor',pcol,'MarkerEdgeColor','none')
title('Fitted Data for Trial A, x = 8.3')
xlabel('y')
ylabel('z')
plot(Yfit,Zfit_a,'Color',lcol,'LineWidth',2)
hold off

% Trial B
figure
hold on
plot(Y_b,Z_b,'o','MarkerFaceColor',pcol,'MarkerEdgeColor','none')
title('Fitted Data for Trial B, x = 11')
xlabel('y')
ylabel('z')
plot(Yfit, Zfit_b,'Color',lcol,'LineWidth',2)
hold off

% Trial C
figure
hold on
plot(Y_c,Z_c,'o','MarkerFaceColor',pcol,'MarkerEdgeColor','none')
title('Fitted Data for Trial C, x = 14')
xlabel('y')
ylabel('z')
plot(Yfit,Zfit_c,'Color',lcol,'LineWidth',2)
hold off


## Generate a surface plot:

For every point along the fit-line for dataset A, connect it to the corresponding point along the fit-line for dataset B using a straight line. This linear interpolation is done automatically by the surf command. If more datasets were available, we could use a nonlinear fit to produce a more accurate surface plot, but for now, we will assume that the experiment is well-behaved, and that 11 and 8.3 are close enough together that we can use linear interpolation between them.

Interpolate along the fit lines to produce YZ points for each data series (X):

## Fits through the fits

Yspace = linspace(25,65,res);
Xs = [8.3; 11; 14];
Xspan1 = linspace(8.3,14,res);
Xspan2 = linspace(8,14.3,res);
q = 0;
r = 0;
for i = 1:res
Zs = [Zfitfun_a(Yspace(i)); Zfitfun_b(Yspace(i)); Zfitfun_c(Yspace(i))];
Yspan = linspace(Yspace(i),Yspace(i),res);
p(:,i) = polyfit(Xs,Zs,2);
for j = 1:res
Zfit_fit(i,j) = polyval(p(:,i),Xspan1(j));
end
end


## Plot a surface through the fits through the fits

figure
hold all
surf(Xspan1,Yspace,Zfit_fit,'EdgeColor','black');   % Generate the surface
%surf(Xsurf,Ysurf,Zsurf,'EdgeColor','black');   % Generate the surface

% plot leading and lagging lines (just for show!)
for i = 1:res
Zs = [Zfitfun_a(Yspace(i)); Zfitfun_b(Yspace(i)); Zfitfun_c(Yspace(i))];
Yspan = linspace(Yspace(i),Yspace(i),res);
plot3(Xspan2,Yspan,polyval(p(:,i),Xspan2),'Color','black')
end

%Plot the raw data:
plot3(X_a,Y_a,Z_a,'o','markersize',5,'MarkerFaceColor',pcol,'MarkerEdgeColor','none')
plot3(X_b,Y_b,Z_b,'o','markersize',5,'MarkerFaceColor',pcol,'MarkerEdgeColor','none')
plot3(X_c,Y_c,Z_c,'o','markersize',5,'MarkerFaceColor',pcol,'MarkerEdgeColor','none')

%Plot the fit lines:
plot3(linspace(8.3,8.3,res),Yfit,Zfit_a,'Color',lcol,'LineWidth',2)
plot3(linspace(11,11,res),Yfit,Zfit_b,'Color',lcol,'LineWidth',2)
plot3(linspace(14,14,res),Yfit,Zfit_c,'Color',lcol,'LineWidth',2)
view(-39,10)
alpha(.2)   % Make the plot transparent
title('z vs. x vs. y')
xlabel('x')
ylabel('y')
zlabel('z')
grid on


## Add a line through the max

We must minimize the negative of each of our initial three fits to find the maximum of the fit.

f1 = @(y) -Zfitfun_a(y);
f2 = @(y) -Zfitfun_b(y);
f3 = @(y) -Zfitfun_c(y);

Ystar_a = fminbnd(f1,0,100);
Ystar_b = fminbnd(f2,0,100);
Ystar_c = fminbnd(f3,0,100);

Ystars = [Ystar_a; Ystar_b; Ystar_c];
Zstars = [Zfitfun_a(Ystar_a); Zfitfun_b(Ystar_b); Zfitfun_c(Ystar_c)];

hold on
plot3(Xs,Ystars,Zstars,'o','markersize',7,'MarkerFaceColor','white')

xy = polyfit(Xs,Ystars,2);
xz = polyfit(Xs,Zstars,2);

plot3(Xspan1,polyval(xy,Xspan1),polyval(xz,Xspan1),'Color','yellow','LineWidth',2)


## A note on colormaps

To edit the color of the surface, you can apply a colormap. Matlab has several built-in colormaps (to see them, type 'doc colormap' in the Command Window. As you see here, however, we are using our own colormap, which has been stored in the file 'mycmap.mat'

To modify a colormap, type

 cmapeditor

in the Command Window or click 'Edit' 'Colormap...' in the figure window. For instructions on how to use the colormap editor, type 'doc colormapeditor' in the Command Window. If you have 'Immediate apply' checked, or you click 'Apply' the colormap will load onto the figure. To save a colormap, type the following into the Command Window:

 mycmap = get(gcf,'Colormap')
 save('mycmap')
s = load('mycmap.mat');
newmap = s.mycmap;
set(gcf,'Colormap',newmap)  % See corresponding 'Note on colormaps')

end


## Summary

Matlab offers a lot of capability to analyze and present data in 3-dimensions.

% categories: plotting, data analysis


## are two averages different?

| categories: data analysis | View Comments

are two averages different?

# are two averages different?

John Kitchin 1/28/2012

## Contents

Class A had 30 students who received an average test score of 78, with standard deviation of 10. Class B had 25 students an average test score of 85, with a standard deviation of 15. We want to know if the difference in these averages is statistically relevant. Note that we only have estimates of the true average and standard deviation for each class, and there is uncertainty in those estimates. As a result, we are unsure if the averages are really different. It could have just been luck that a few students in class B did better.

## The hypothesis:

the true averages are the same. We need to perform a two-sample t-test of the hypothesis that (this is often called the null hypothesis). we use a two-tailed test because we do not care if the difference is positive or negative, either way means the averages are not the same.

## the data

n1 = 30; % students in class A
x1 = 78; % average grade in class A
s1 = 10; % std dev of exam grade in class A

n2 = 25; % students in class B
x2 = 85; % average grade in class B
s2 = 15; % std dev of exam grade in class B


## Compute the standard error

this is the standard error of the difference between the two averages. Note the standard error of the difference is less than the standard error of the individual averages.

SE = sqrt(s1^2/n1 + s2^2/n2)

SE =

3.5119



## calculate degrees of freedom

see the discussion at http://stattrek.com/Help/Glossary.aspx?Target=Two-sample%20t-test for a more complex definition of degrees of freedom. Here we simply subtract one from each sample size to account for the estimation of the average of each sample.

DF = (n1 - 1) + (n2 - 1)

DF =

53



## compute the t-score for our data

The difference between two averages determined from small sample numbers follows the t-distribution. the t-score is the difference between the difference of the means and the hypothesized difference of the means, normalized by the standard error. we compute the absolute value of the t-score to make sure it is positive for convenience later.

tscore = abs(((x1 - x2)-0)/SE)

tscore =

1.9932



## Interpretation

A way to approach determinining if the difference is significant or not is to ask, does our computed average fall within a confidence range of the hypothesized value (zero)? If it does, then we can attribute the difference to statistical variations at that confidence level. If it does not, we can say that statistical variations do not account for the difference at that confidence level, and hence the averages must be different.

Lets compute the t-value that corresponds to a 95% confidence level for a mean of zero with the degrees of freedom computed earlier. This means that 95% of the t-scores we expect to get will fall within t95.

ci = 0.95;
alpha = 1 - ci;
t95 = tinv(1-alpha/2, DF)

t95 =

2.0057



since tscore < t95, we conclude that at the 95% confidence level we cannot say these averages are statistically different because our computed t-score falls in the expected range of deviations. Note that our t-score is very close to the 95% limit. Let's consider a smaller confidence interval.

ci = 0.94;
alpha = 1 - ci;
t94 = tinv(1-alpha/2, DF)

t94 =

1.9219



at the 94% confidence level, however, tscore > t94, which means we can say with 94% confidence that the two averages are different; class B performed better than class A did. Alternatively, there is only about a 6% chance we are wrong about that statement.

## another way to get there

An alternative way to get the confidence that the averages are different is to directly compute it from the cumulative t-distribution function. We compute the difference between all the t-values less than tscore and the t-values less than -tscore, which is the fraction of measurements that are between them. You can see here that we are practically 95% sure that the averages are different.

f = tcdf(tscore,DF)-tcdf(-tscore,DF)

f =

0.9486


'done'
% categories: data analysis

ans =

done



## Using Java inside Matlab

Using Java inside Matlab

# Using Java inside Matlab

John Kitchin

Matlab is pretty good, but it doesn't do everything, and you might not always have the toolbox that contains the function you want. There might even be an excellent open-source software library that can't be included in Matlab, but can be used within Matlab.

Matlab has excellent support for using Java with minimal effort. A repository of numerical libraries can be found at http://math.nist.gov/javanumerics/. In this example we install the commons-math Java library, and show an example of using it.

## Contents

Here are a few things the commons libary can do:

  Computing means, variances and other summary statistics for a list of numbers
Fitting a line to a set of data points using linear regression
Finding a smooth curve that passes through a collection of points (interpolation)
Fitting a parametric model to a set of measurements using least-squares methods
Solving equations involving real-valued functions (i.e. root-finding)
Solving systems of linear equations
Solving Ordinary Differential Equations
Minimizing multi-dimensional functions
Generating random numbers with more restrictions (e.g distribution, range) than what is possible using the JDK
Generating random samples and/or datasets that are "like" the data in an input file
Performing statistical significance tests
Miscellaneous mathematical functions such as factorials, binomial coefficients and "special functions" (e.g. gamma, beta functions)

Many of them duplicate features in Matlab, but they may implement different algorithms, or more convenient syntax.

clear all; clc; close all


These two commands only need to be run once, so comment them out after you run them.

%urlwrite('http://mirror.metrocast.net/apache/commons/math/binaries/commons-math-2.2.zip','commons-math-2.2.zip')
%unzip('commons-math-2.2.zip')


We have to add the jar file to the javaclasspath so we can use it.

javaaddpath([pwd '/commons-math-2.2/commons-math-2.2.jar'])


## linear regression with Matlab

Note that the regress command is part of the Statistics toolbox, which is great if you have it! We show here how to do it, so we also have something to compare the Java result to.

x = [1 2 3 4 5]';
y = [3 5 7 14 11]';

[b bint]= regress(y,[x.^0 x])

b =

0.5000
2.5000

bint =

-7.5615    8.5615
0.0694    4.9306



## Now the Java way

The first thing we do is import the Java library. the * at the end indicates that all classes should be imported.

import org.apache.commons.math.stat.regression.*


## Create a class instance.

The SimpleRegression class has several methods that we can call later to perform analysis

regression = SimpleRegression();
methods(regression)

Methods for class org.apache.commons.math.stat.regression.SimpleRegression:

SimpleRegression            getSlope
clear                       getSlopeStdErr
equals                      getSumSquaredErrors
getClass                    getTotalSumSquares
getIntercept                hashCode
getInterceptStdErr          notify
getMeanSquareError          notifyAll
getN                        predict
getR                        toString
getRSquare                  wait
getRegressionSumSquares
getSignificance



regression.addData([x y]);


## get the slope and some statistics of the slope

m = regression.getSlope()
mstd = regression.getSlopeStdErr()
mint = regression.getSlopeConfidenceInterval(); % 95% by default
[m - mint  m + mint] % this is the 95% interval

m =

2.5000

mstd =

0.7638

ans =

0.0694    4.9306



Note the confidence interval is greater than +- 2*sigma. This is because of the student-t table multiplier on the standard error due to the small number

## get the intercept and some statistics

note there is not getInterceptConfidenceInterval method for some reason!

b = regression.getIntercept()
bstd = regression.getInterceptStdErr()

sprintf('R^2 = %1.3f', regression.getRSquare())

b =

0.5000

bstd =

2.5331

ans =

R^2 = 0.781



## Summary

It takes more lines of code to get the Java version of linear regression, but you have finer control over what data you get, and what order you get it in. You still have to read the Java documentation to learn the command syntax, and debugging can be somewhat more difficult than straight Matlab. However, the integration of Java into Matlab opens some interesting possibilities to extend Matlab's functionality.

% categories: Miscellaneous, data analysis