# Microsoft Office Application Help - Excel Help forum > Excel Programming / VBA / Macros >  >  non-linear regression: fitting data to a sigmoidal (psychophysical) curve

## chiasmata

Hello all,

I am trying to fit the data I obtained from performing an experiment to a sigmoidal/psychophysical curve that looks like the attachment provided. I am currently able to do this using Sigmaplot software but it is so time consuming, and all of my data is already in excel, soI was wondering if I could use excel for this. 

Here is an example of what the data looks like. I used Sigmaplot to obtain the curve fitting seen in the attachment. Sigmaplot also provides me with two columns where they show the extrapolated x and y points. I need to extrapolate a 75% threshold from the curve, so I need to extrapolate a value of x that goes with a value of 75% of y. In the second attachment, you can see how this works, look at the highlighted cells. 
Help with this will save me weeks in analysis time!!! Thank you and good karma to you! :Smilie:  :Smilie: !

% difficulty (x-axis)  % correct (y-axis)
0%...........................................100%
10%..........................................95%
20%......................................... 95%
30%..........................................85%
40%........................................ .90%
50%........................................ .60%
60%.........................................65%
70%........................................ 55% 
80%........................................ 50%
90%........................................ 60%
100%........................................45%

----------


## romperstomper

Which version of Excel are you using? Your profile says 2007 for Mac, but that doesn't exist.

----------


## MrShorty

Does Sigmaplot use a least squares regression algorithm or some other?  Do you know what equation Sigmaplot is using to fit the data?

In general, I see two ways to do regression in Excel:

1) Use LINEST with a "linear" (as used in linear algebra, not referring strictly to straight lines) or "linearizable" (is that a word?) function.  LINEST uses a least squares algorithm

2) For "non-linear" and "non-linearizable" functions, you can use Solver to obtain regression parameters.  You can still set it up to use a least squares algorithm, or, if you program the spreadsheet for it, you can use other optimization algorithms.

1st step in a problem like this, IMO, is to select the form of the equation you want to fit the data to.

----------


## shg

I think it gets regressed to the logistics function 

y = 1 / (1 + exp(-x))

... parameterized to account for shift and scale as 

y = cc + a / (1 + exp((x0 - x)/b)

... which regresses (using Solver) to an RMS error of ≈ 5% with 

a ≈ 0.36
b ≈ -0.02
cc ≈ 0.57
x0 ≈ 0.45

----------


## chiasmata

Oh, sorry, I meant 2008 for mac. I fixed this in my profile. Thanks for bringing it to my attention.

----------


## chiasmata

> Does Sigmaplot use a least squares regression algorithm or some other?  Do you know what equation Sigmaplot is using to fit the data?
> 
> In general, I see two ways to do regression in Excel:
> 
> 1) Use LINEST with a "linear" (as used in linear algebra, not referring strictly to straight lines) or "linearizable" (is that a word?) function.  LINEST uses a least squares algorithm
> 
> 2) For "non-linear" and "non-linearizable" functions, you can use Solver to obtain regression parameters.  You can still set it up to use a least squares algorithm, or, if you program the spreadsheet for it, you can use other optimization algorithms.
> 
> 1st step in a problem like this, IMO, is to select the form of the equation you want to fit the data to.



I'll be honest, I only took first year statistics so I'm not a stats guru yet. My proff had basically told me which buttons to press so I can tell you what steps I've been taking: regression wizard-->Equation category:Sigmoidal-->Equation name: Logistic, 4 parameter. I'm attaching a screenshot of the first window in the wizard, since it includes the equation. After selecting the columns containing the x-axis and y-axis data, I get the window with a, b, x0, y0 results that I also attached. All I know is how to get the number I need out of this setup, sorry I can't be of more help....

----------


## chiasmata

> I think it gets regressed to the logistics function 
> 
> y = 1 / (1 + exp(-x))
> 
> ... parameterized to account for shift and scale as 
> 
> y = cc + a / (1 + exp((x0 - x)/b)
> 
> ... which regresses (using Solver) to an RMS error of ≈ 5% with 
> ...



Yes, I think you're right, "y = cc + a / (1 + exp((x0 - x)/b)" seems to be the equation Sigmaplot uses, as can be seen in the screenshot I'm  re-attaching. But the numbers the wizard ended up with were different...still, how can I use this information to perform this analysis in Excel directly? I've never done anything this advanced in Excel.

----------


## MrShorty

Logistics function is not linearizable, so we'll have to resort to the purely numerical approach.

Basic steps to do this same thing in Excel:

1) column A, put the known_x values.  Column B, put the known_y values.  Somewhere (I usually put them at the top), put the values for the equation parameters.

2)  column C, put the logistics formula as described.  Will look something like =$A$2+$B$2/(1+exp(($C$2-A10)/$D$2))

3) column D = (B10-C10)^2 : This is the square of the deviation, because we are going to assume you want to use a least squares algorithm

4) at the top or bottom of column D, sum up the deviations =sum(d10:d100)

5) Now we have the spreadsheet set up to calculate the sum of the squares of the deviations at some assumed parameters for the equation.  Now run the Solver add-in, telling it to minimize the cell from step 4 by changing the cells containing the equation parameters.

6) Plot the regression curve to make sure it is reasonable.

----------


## chiasmata

> Logistics function is not linearizable, so we'll have to resort to the purely numerical approach.
> 
> Basic steps to do this same thing in Excel:
> 
> 1) column A, put the known_x values.  Column B, put the known_y values.  Somewhere (I usually put them at the top), put the values for the equation parameters.
> 
> 2)  column C, put the logistics formula as described.  Will look something like =$A$2+$B$2/(1+exp(($C$2-A10)/$D$2))
> 
> 3) column D = (B10-C10)^2 : This is the square of the deviation, because we are going to assume you want to use a least squares algorithm
> ...




Hi MrShorty,

Thanks so much for the detailed response! Just to be sure I'm doing this right (and bear with me here, please): 

*for column C, do I only put the logistics formula in once or in every cell of the formula, with "2" replaced by "3", "4" etc as I go down the column? Also, for column C I'm getting a "cyclical formula" error that tells me that I cannot have a formula in a cell that refers to that cell (this is assuming the numbers are replaced with the corresponding location within the column). 
*For column D, I'm sorry, I don't understand why I would do B10-C10. Is B (or "% correct) minus the result of the formula the deviation for that particular row? Just trying to understand what I'm doing, sorry to be so basic...Also, I would have multiples of this, right? as in, (B1-C1), (B2-C2), etc.?
*In column D, why would I put in =sum D10:D100? 

I attached a screenshot so you can see what I have so far.

Thanks so much!!! And again, sorry to be so basic.

-Chiasmata

----------


## shg

You're much more likely to get useful assistance if you post workbooks rather than pictures.

----------


## chiasmata

> You're much more likely to get useful assistance if you post workbooks rather than pictures.



OK, here you go.

----------


## MrShorty

Sorry, I can't download the sample workbook, in part because I'm working on an older version of Excel and can't open the xlsx file format, so I'll comment based on the screenshot.





> *for column C, do I only put the logistics formula in once or in every cell of the formula, with "2" replaced by "3", "4" etc as I go down the column?



  Yes, the logistics formula gets copied down for each data point in the regression.  Why, because, in a least squares regression, you need to be able to calculate the difference between y(measured) and y(calculated) for each point.  So you have to calculate the regression formula for each data point.





> Also, for column C I'm getting a "cyclical formula" error



  Probably because your spreadsheet is set up differently from the spreadsheet I envisioned.  Where I envisioned the regression parameters in row 2 above the data, you've put the regression parameters in column A below the data.  It should work just fine if you adjust the cell references in my formulas for your spreadsheet.





> I don't understand why I would do B10-C10. Is B (or "% correct) minus the result of the formula the deviation for that particular row?



  B10-C10 is the deviation for that data point (C10 represents % correct as well, doesn't it? It's just the regressed % correct rather than the measured % correct.).  As noted above, this is an essential calculation in a least squares regression.  The basic idea in a least squares regression is to adjust the regression parameters to minimize the sum of the squares of the deviations.

If you don't like calculating, the deviation for each point and summing in separate steps like I've done, there is the SUMXMY2() function that would combine those steps.  I personally like to see each deviation when I do a regression so I can see if the regression routine gave me a reasonable fit or if it converged on something completely wrong.

----------


## shg

See attached. The model is set up in Solver.

----------


## chiasmata

> Sorry, I can't download the sample workbook, in part because I'm working on an older version of Excel and can't open the xlsx file format, so I'll comment based on the screenshot.
> 
>   Yes, the logistics formula gets copied down for each data point in the regression.  Why, because, in a least squares regression, you need to be able to calculate the difference between y(measured) and y(calculated) for each point.  So you have to calculate the regression formula for each data point.
> 
>   Probably because your spreadsheet is set up differently from the spreadsheet I envisioned.  Where I envisioned the regression parameters in row 2 above the data, you've put the regression parameters in column A below the data.  It should work just fine if you adjust the cell references in my formulas for your spreadsheet.
> 
>   B10-C10 is the deviation for that data point (C10 represents % correct as well, doesn't it? It's just the regressed % correct rather than the measured % correct.).  As noted above, this is an essential calculation in a least squares regression.  The basic idea in a least squares regression is to adjust the regression parameters to minimize the sum of the squares of the deviations.
> 
> If you don't like calculating, the deviation for each point and summing in separate steps like I've done, there is the SUMXMY2() function that would combine those steps.  I personally like to see each deviation when I do a regression so I can see if the regression routine gave me a reasonable fit or if it converged on something completely wrong.



Thank you so much for taking the time to explain it all to me!!  :Smilie:

----------


## chiasmata

> See attached. The model is set up in Solver.



WOW, this is incredible! I never used Solver before. The curve looks just like Sigmaplot's curve! Only three tiny questions: (1) is there a way to extrapolate the 75% threshold, like in Sigmaplot? (get the precise x value for a y of 75) and (2) just curious: can there not be a 100% value for the x-axis? it only goes up to 90, and (3) the Sigmaplot curve goes from 96% to 49% and this one goes from 94% to 57%...is it somehow a rougher fit than their curve?

Thank you so much for this!

----------


## MrShorty

> (1) is there a way to extrapolate the 75% threshold, like in Sigmaplot? (get the precise x value for a y of 75)



  2 different methods (not necessarily Excel related)  a).  Algebraically rearrange the logistics function to get x as a function of y.  then plug 75% into this rearranged form of the equation to get x.  or b) solve the equation numerically.  Solver can do this, too.  Set up the logistics function in a cell.  Start solver and tell it to "set target cell" as this cell with the logistics function in it; to a value of 75%; by changing the x cell that this cell refers to.  Just note that the two methods will give slightly different results.  Algebraic rearrangement will give an "exact" answer (to within the limits of double precision arithmetic) where solver will converge on a solution to within the tolerances specified in the solver model.  Usually this difference is small (10^-4 or better).





> (2) just curious: can there not be a 100% value for the x-axis? it only goes up to 90,



  I can't download shg's file, but I would expect this is just a matter of formatting the axis.





> (3) the Sigmaplot curve goes from 96% to 49% and this one goes from 94% to 57%...is it somehow a rougher fit than their curve?



  Welcome to the finer points of data analysis.  A few possible reasons for the differences:
a) different "objective function:"  For the most part, we've (I've) assumed a simple least squares regression, where we minimize the sum of the squares of the deviations [objective function in this case is OF=sum((y-y0)^2)].  There are other OF's that could be chosen, and these would give different results.  Whether a different OF gives a better fit depends on our reasoning for using the specified OF.  Spreadsheets are very flexible, we could program any OF into the spreadsheet we want.
b) additional constraints/information: Sigmaplot may have constraints or other information it is using to modify the regression that we haven't incorporated into the spreadsheet model.  These can be programmed into the spreadsheet once they are known.
c) differences in convergence algorithm: there could be any number of other differences in the algorithms used to optimize the regression.  These differences can give different results.

----------


## shg

In the picture you posted, SigmaPlot is using a very different function:

y = y0 + a / (1 + (x/x0)^b)

That function is undefined for x0=0 (which is its nominal value for the unparameterized logistic function). Dunno why they chose that.

----------


## chiasmata

> In the picture you posted, SigmaPlot is using a very different function:
> 
> y = y0 + a / (1 + (x/x0)^b)
> 
> That function is undefined for x0=0 (which is its nominal value for the unparameterized logistic function). Dunno why they chose that.



Ooooh, I see.

----------


## chiasmata

> 2 different methods (not necessarily Excel related)  a).  Algebraically rearrange the logistics function to get x as a function of y.  then plug 75% into this rearranged form of the equation to get x.  or b) solve the equation numerically.  Solver can do this, too.  Set up the logistics function in a cell.  Start solver and tell it to "set target cell" as this cell with the logistics function in it; to a value of 75%; by changing the x cell that this cell refers to.  Just note that the two methods will give slightly different results.  Algebraic rearrangement will give an "exact" answer (to within the limits of double precision arithmetic) where solver will converge on a solution to within the tolerances specified in the solver model.  Usually this difference is small (10^-4 or better).
> 
>   I can't download shg's file, but I would expect this is just a matter of formatting the axis.
> 
>   Welcome to the finer points of data analysis.  A few possible reasons for the differences:
> a) different "objective function:"  For the most part, we've (I've) assumed a simple least squares regression, where we minimize the sum of the squares of the deviations [objective function in this case is OF=sum((y-y0)^2)]...



Thanks for clarifying all these points. I also thought that it can't really match Sigmaplot's analysis precisely.

----------


## chiasmata

Thanks for helping, everyone!

----------


## m1zz13

Definitely answered my questions shg. Thank you!  :Smilie:

----------


## kd5649

Post Deleted

----------


## MrShorty

kd5649: this forum is rather strict about rule #2 -- don't post your question in another user's thread. Start a new thread and, if you feel this thread is relevant to the discussion, include a link to this thread as part of your description of the question.

----------

