Fitting Models


Critical Reviews in Biochemistry and Molecular Biology, 35(5):359 391 (2000)
Beyond Eyeballing: Fitting Models to
Experimental Data
Arthur Christopoulos and Michael J. Lew
Table of Contents
I. Introduction ............................................................................................. 361
A.  Eyeballing ..................................................................................... 361
B. Models .............................................................................................. 361
II. Empirical or Mechanistic? ..................................................................... 361
III. Types of Fitting ....................................................................................... 363
A. Correlation ........................................................................................ 363
1. The Difference Between Correlation and Linear
Regression ............................................................................... 363
2. The Meaning of r2 ................................................................... 363
3. Assumptions of Correlation Analysis ..................................... 364
4. Misuses of Correlation Analysis ............................................. 364
B. Regression ........................................................................................ 365
1. Linear Regression .................................................................... 365
2. Ordinary Linear Regression .................................................... 365
3. Multiple Linear Regression ..................................................... 365
4. Nonlinear Regression .............................................................. 367
5. Assumptions of Standard Regression Analyses...................... 367
IV. How It Works .......................................................................................... 368
A. Minimizing an Error Function (Merit Function) ............................. 368
B. Least Squares.................................................................................... 368
C. Nonleast Squares .............................................................................. 371
D. Weighting.......................................................................................... 371
E. Regression Algorithms ..................................................................... 372
V. When to Do It (Application of Curve Fitting Procedures) ................ 374
A. Calibration Curves (Standard Curves) ............................................. 374
B. Parameterization of Data (Distillation) ............................................ 374
1040-9238/00/$.50
© 2000 by CRC Press LLC
359
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
VI. How to Do It ............................................................................................ 374
A. Choosing the Right Model ............................................................... 374
1. Number of Parameters ............................................................ 374
2. Shape ....................................................................................... 375
3. Correlation of Parameters ....................................................... 376
4. Distribution of Parameters ...................................................... 376
B. Assessing the Quality of the Fit ...................................................... 377
1. Inspection................................................................................. 377
2. Root Mean Square ................................................................... 377
3. R2 (Coefficient of Determination)........................................... 378
4. Analysis of Residuals .............................................................. 379
5. The Runs Test .......................................................................... 379
C. Optimizing the Fit ............................................................................ 380
1. Data Transformations .............................................................. 380
2. Initial Estimates ....................................................................... 381
D. Reliability of Parameter Estimates .................................................. 382
1. Number of Datapoints ............................................................. 382
2. Parameter Variance Estimates from Repeated
Experiments ............................................................................. 383
3. Parameter Variance Estimates from Asymptotic
Standard Errors ........................................................................ 384
4. Monte Carlo Methods ............................................................. 385
5. The Bootstrap .......................................................................... 386
6. Grid Search Methods .............................................................. 387
7. Evaluation of Joint Confidence Intervals ............................... 387
E. Hypothesis Testing ........................................................................... 387
1. Assessing Changes in a Model Fit between
Experimental Treatments......................................................... 387
2. Choosing Between Models ..................................................... 388
VII. Fitting Versus Smoothing ....................................................................... 388
VIII. Conclusion................................................................................................ 389
IX. Software ................................................................................................. 389
References ................................................................................................. 390
360
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
pects of the system of interest. In one sense,
I. INTRODUCTION
scientists are constructing and dealing with
models all the time inasmuch as they form
A.  Eyeballing
 worldview models; experiments are de-
signed and conducted and then used in an
intuitive fashion to build a mental picture of
The oldest and most commonly used
what the data may be revealing about the
tool for examining the relationship between
experimental system (see Kenakin, this vol-
experimental variables is the graphical dis-
ume). The experimental results are then fre-
play. People are very good at recognizing
quently analyzed by applying either empiri-
patterns, and can intuitively detect various
cal or mechanistic mathematical models to
modes of behavior far more easily from a
the data. It is these models that are the sub-
graph than from a table of numbers. The
ject of this article.
process of  eyeballing the data thus repre-
sents the experimenter s first attempt at
understanding their results and, in the past,
II. EMPIRICAL OR MECHANISTIC?
has even formed the basis of formal quanti-
tative conclusions. Eyeballing can some-
times be assisted by judicious application of
Empirical models are simple descrip-
a ruler, and often the utility of the ruler has
tors of a phenomenon that serve to approxi-
been enhanced by linearizing data transfor-
mate the general shape of the relationship
mations. Nowadays it is more common to
being investigated without any theoretical
use a computer-based curve-fitting routine
meaning being attached to the actual pa-
to obtain an  unbiased analysis. In some
rameters of the model. In contrast, mecha-
common circumstances there is no impor-
nistic models are primarily concerned with
tant difference in the conclusions that would
the quantitative properties of the relation-
be obtained by the eye and by the computer,
ship between the model parameters and its
but there are important advantages of the
variables, that is, the processes that govern
more modern methods in many other cir-
(or are thought to govern) the phenomenon
cumstances. This chapter will discuss some
of interest. Common examples of mecha-
of those methods, their advantages, and how
nistic models are those related to mass ac-
to choose between them.
tion that are applied to binding data to ob-
tain estimates of chemical dissociation
constants whereas nonmechanistic, empiri-
B. Models
cal models might be any model applied to
drug concentration response curves in or-
der to obtain estimates of drug potency. In
The modern methods of data analysis
general, mechanistic models are often the
frequently involve the fitting of mathemati-
most useful, as they consist of a quantitative
cal models to the data. There are many rea-
formulation of a hypothesis.1 However, the
sons why a scientist might choose to model
consequences of using an inappropriate
and many different conceptual types of
mechanistic model are worse than for em-
models. Modeling experiments can be en-
pirical models because the parameters in
tirely constructed within a computer and
mechanistic models provide information
used to test  what if types of questions
about the quantities and properties of real
regarding the underlying mathematical as-
system components. Thus, the appropriate-
361
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
ness of mechanistic models needs close scru- enzyme and S was the number of molecules
tiny. of oxygen bound per enzyme. Subsequent
The designation of a mathematical model experiments over the years have revealed
as either empirical or mechanistic is based that this model was inadequate in account-
predominantly on the purpose behind fitting ing for the true underlying molecular mecha-
the model to experimental data. As such, nism of oxygen-hemoglobin binding, but
the same model can be both empirical and the equation remains popular both as a
mechanistic depending on its context of use. mechanistic model when its validity is ac-
As an example, consider the following form cepted, and as an empirical model where its
of the Hill equation: shape approximates that of experimental
data. For instance, if the experimental curve
Ä…[A]S
Y =
(1) is a result of the direct binding of a
[A]S + KS
radioligand to a receptor, then application
This equation is often used to analyze of Equation (1) to the dataset can be used to
concentration occupancy curves for the in- detect whether the interaction conforms to
teraction of radioligands with receptors or the simplest case of one-site mass-action
concentration response curves for the func- binding and, if S = 1, the parameters K and
tional interaction of agonist drugs with re- Ä… can be used as quantitative estimates of
ceptors in cells or tissues. The Hill equation the ligand-receptor dissociation constant
describes the observed experimental curve (KD) and total density of receptors (Bmax),
in terms of the concentration of drug (A), a respectively. This is an example where the
maximal asymptote (Ä…), a midpoint loca- Hill equation is a mechanistic equation,
tion (K), and a midpoint slope (S). In prac- because the resulting parameters provide
tice, these types of curves are most conve- actual information about the underlying
niently visualized on a semi-logarithmic properties of the interaction. In contrast,
scale, as shown in Figure 1. concentration response curves represent the
When Hill first derived this equation,2,3 final element in a series of sequential bio-
he based it on a mechanistic model for the chemical cascades that yield the observed
binding of oxygen to the enzyme, hemoglo- response subsequent to the initial mass-ac-
bin. In that context, the parameters that Hill tion binding of a drug to its receptor. Thus,
was interested in, K and S, were meant to although the curve often retains a sigmoidal
reveal specific biological properties about shape that is similar to the binding curve,
the interaction he was studying; K was a the Hill equation is no longer valid as a
measure of the affinity of oxygen for the mechanistic equation. Hence, the Hill equa-
FIGURE 1. Concentration binding (left) and concentration response (right) curves showing the
parameters of the Hill equation (Ä…, K, and S) as mechanistic (left) or empirical (right) model
descriptors.
362
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
tion is useful in providing a good fit to gether. It is meaningful only when both vari-
sigmoidal concentration response curves, ables are outcomes of measurement such that
but the resulting parameters are considered there is no independent variable.
empirical estimates of maximal response,
midpoint slope, and midpoint location, and
1. The Difference between
no mechanistic interpretation should be
Correlation and Linear
made.
Regression
III. TYPES OF FITTING
Correlation quantifies how well two
dependent variables vary together; linear
regression finds the line that best predicts a
The variables whose relationships that
dependent variable given one or more inde-
can be plotted on Cartesian axes do not
pendent variables, that is, the  line of best-
necessarily have the same properties. Often
fit. 4 Correlation calculations do not find a
one variable is controlled by the experi-
best-fit straight line.5
menter and the other variable is a measure-
ment. Thus one variable has substantially
more uncertainty or variability than the other,
2. The Meaning of r2
and traditionally that variable would be plot-
ted on the vertical axis. In that circumstance
the Y variable can be called the  depen- The direction and magnitude of the cor-
dent variable because of its dependence on relation between two variables can be quan-
the underlying relationship and on the other tified by the correlation coefficient, r, whose
variable, which is called  independent to values can range from  1 for a perfect nega-
denote its higher reliability. It is important tive correlation to 1 for a perfect positive
to note that not all datasets have a clearly correlation. A value of 0, of course, indi-
independent variable. Historically, the sta- cates a lack of correlation. In interpreting
tistical determination of the relationship the meaning of r, a difficulty can arise with
between two or more dependent variables values that are somewhere between 0 and 
has been referred to as a correlation analy- 1 or 0 and 1. Either the variables do influ-
sis, whereas the determination of the rela- ence each other to some extent, or they are
tionship between dependent and indepen- under the influence of an additional factor
dent variables has come to be known as a or variable that was not accounted for in the
regression analysis. Both types of analyses, experiment and analysis. A better  feel for
however, can share a number of common the covariation between two variables may
features, and some are discussed below. be derived by squaring the value of the
correlation coefficient to yield the coeffi-
cient of determination, or r2 value. This
number may be defined as the fraction of
A. Correlation
the variance in the two variables that is
shared, or the fraction of the variance in one
Correlation is not strictly a regression
variable that is explained by the other (pro-
procedure, but in practice it is often confused
vided the following assumptions are valid).
with linear regression. Correlation quantifies
The value of r2, of course, will always be
the degree by which two variables vary to- between 0 and 1.
363
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
5. The X values were measurements, not
3. Assumptions of Correlation
controlled (e.g., concentration, etc.).
Analysis
The confidence interval for r2 is other-
wise meaningless, and we must then
1. The subjects are randomly selected
use linear regression.
from a larger population. This is often
6. The X and Y values follow a Gaussian
not true in biomedical research, where
distribution.
randomization is more common than
7. The covariation is linear.
sampling, but may be sufficient to as-
sume that the subjects are at least rep-
resentative of a larger population.
4. Misuses of Correlation
2. The samples are paired, i.e., each experi-
Analysis
mental unit has both X and Y values.
3. The observations are independent of
Often, biomedical investigators are in-
each other. Sampling one member of
the population should not affect the terested in comparing one method for mea-
probability of sampling another mem- suring a biological response with another.
ber (e.g., making measurements in the This usually involves graphing the results as
same subject twice and treating them
an X, Y plot, but what to do next? It is quite
as separate datapoints; making mea- common to see a correlation analysis applied
surements in siblings).
to the two methods of measurement and the
4. The measurements are independent. If
correlation coefficient, r, and the resulting P
X is somehow involved or connected
value utilized in hypothesis testing. How-
to the determination of Y, or vice versa,
ever, Ludbrook6 has outlined some serious
then correlation is not valid. This as-
criticisms of this approach, the major one
sumption is very important because
being that although correlation analysis will
artifactual correlations can result from
identify the strength of the linear association
its violation. A common cause of such
between X and Y, as it is intended to do, it
a problem is where the Y value is ex-
will give no indication of any bias between
pressed as either a change from the X
the two methods of measurement. When the
value, or as a fraction of the corre-
purpose of the exercise is to identify and
sponding X value (Figure 2).
quantify fixed and proportional biases be-
FIGURE 2. An apparent correlation between two sets of unrelated random numbers (pseudo-
random numbers generated with mean = 5 and standard deviation = 1) comes about where the Y
value is expressed as a function of the X value (here each Y value is expressed as a fraction of
the corresponding X value).
364
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
tween two methods of measurement, then inexact fits: an  error parameter which we
correlation analysis is inappropriate, and a will denote as µ. Thus we have the formula:
technique such as ordinary or weighted least
products regression6 should be used. Y = Ä… + ²X + µ (2)
The parameter, Ä…, is a constant, often
called the  intercept while b is referred to
B. Regression
as a regression coefficient that corresponds
to the  slope of the line. The additional
The actual term  regression is derived
parameter, µ, accounts for the type of error
from the latin word  regredi, and means
that is due to random variation caused by
 to go back to or  to retreat. Thus, the
experimental imprecision, or simple fluc-
term has come to be associated with those
tuations in the state of the system from one
instances where one  retreats or  resorts
time point to another. This error term is
to approximating a response variable with
sometimes referred to as the stochastic com-
an estimated variable based on a functional
ponent of the model, to differentiate it from
relationship between the estimated variable
the other, deterministic, component of the
and one or more input variables. In regres-
model (Figure 3).7 When data are fitted to
sion analysis, the input (independent) vari- the actual straight-line model, the error term
ables can also be referred to as  regressor
denoted by µ is usually not included in the
or  predictor variables.
fitting procedure so that the output of the
regression forms a perfect straight line based
solely on the deterministic component of
1. Linear Regression
the model. Nevertheless, the regression pro-
cedure assumes that the scatter of the
datapoints about the best-fit straight line
The most straightforward methods for
reflects the effects of the error term, and it
fitting a model to experimental data are those
is also implicitly assumed that µ follows a
of linear regression. Linear regression in-
Gaussian distribution with a mean of 0. This
volves specification of a linear relationship
assumption is often violated, however, and
between the dependent variable(s) and cer-
the implications are discussed elsewhere in
tain properties of the system under investi-
this article. For now, however, we will as-
gation. Surprisingly though, linear regres-
sume that the error is Gaussian; Figure 4
sion deals with some curves (i.e., nonstraight
illustrates the output of the linear model
lines) as well as straight lines, with regres-
with the inclusion of the error term. Note
sion of straight lines being in the category
that the Y values of the resulting  line are
of  ordinary linear regression and curves
randomly distributed above and below the
in the category of  multiple linear regres-
ideal (dashed) population line defined by
sions or  polynomial regressions.
the deterministic component of the model.
2. Ordinary Linear Regression
3. Multiple Linear Regression
The simplest general model for a straight
The straight line equation [Equation (2)]
line includes a parameter that allows for
is the simplest form of the linear regression
365
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 3. The simple linear population model equation indicating the deterministic component of
the model that is precisely determined by the parameters Ä… and ², and the stochastic component
of the model, µ, that represents the contribution of random error to each determined value of Y.
model, because it only includes one inde- model. The resulting derivatives should not
pendent variable. When the relationship of include any of the parameters; otherwise,
interest can be described in terms of more the model is said to be  nonlinear. Con-
than one independent variable, the regres- sider the following second-order polyno-
sion is then defined as  multiple linear re- mial model:
gression. The general form of the linear
regression model may thus be written as: Y = Ä… + ²1X + ²2X2 (5)
Y = Ä… + ²1X1 + ²2X2 + & + ²iXi + µ (3) Taking first derivatives with respect to
each of the parameters yields:
where Y is the dependent variable, and X1,
"Y
X2 & Xi are the (multiple) independent = 1 (6)
"Ä…
variables. The output of this model can de-
viate from a straight line, and one may thus
"Y
question the meaning of the word  linear = X (7)
"²1
in  linear regression. Linear regression
implies a linear relationship between the
"Y
dependent variable and the parameters, not (8)
= X2
"²2
the independent variables of the model. Thus
Equation (3) is a linear model because the
parameters Ä…, ²1, ²2 & ²i have the (implied) The model is linear because the first
exponent of unity. Multiple linear regres- derivatives do not include the parameters.
sion models also encompass polynomial As a consequence, taking the second (or
functions: higher) order derivative of a linear func-
tion with respect to its parameters will
Y = Ä… + ²1X + ²2X2 + & + ²iXi + µ (4) always yield a value of zero.8 Thus, if the
independent variables and all but one pa-
The equation for a straight line [Equa- rameter are held constant, the relationship
tion (2)] is a first-order polynomial. The between the dependent variable and the
quadratic equation, Y = Ä… + ²1X + ²2X2, is remaining parameter will always be lin-
a second-order polynomial whereas the cu- ear.
bic equation, Y = Ä… + ²1X + ²2X2 + ²3X3 is It is important to note that linear re-
a third-order polynomial. Each of these gression does not actually test whether
higher order polynomial equations defines the data sampled from the population fol-
curves, not straight lines. Mathematically, a low a linear relationship. It assumes lin-
linear model can be identified by taking the earity and attempts to find the best-fit
first derivative of its deterministic compo- straight line relationship based on the data
nent with respect to the parameters of the sample.
366
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 4. A linear model that incorporates a stochastic (random error) component. The dashed
line is the deterministic component, whereas the points represent the effect of random error
[denoted by the symbol µ in Equation (2)].
ues of K and S, Ä… is referred to as a  condi-
4. Nonlinear Regression
tionally linear parameter. Nonlinear mod-
els that contain conditionally linear param-
Because there are so many types of
eters have some advantages when it comes
nonlinear relationships, a general model that
to actual curve fitting.7
encompasses all their behaviors cannot be
defined in the sense used above for linear
models, so we will define an explicit non-
5. Assumptions of Standard
linear function for illustrative purposes. In
Regression Analyses4,7
this case, we will use the Hill equation [Equa-
tion (1); Figure 1] which contains one inde-
1. The subjects are randomly selected
pendent variable [A], and 3 parameters, Ä…,
from a larger population. The same
K, and S. Differentiating Y with respect to
caveats apply here as with correlation
each model parameter yields the following:
analyses.
2. The observations are independent.
"Y [A]S
3. X and Y are not interchangeable. Re-
= (9)
"Ä… [A]S + KS
gression models used in the vast ma-
jority of cases attempt to predict the
"Y -Ä…S(KA])S
[
dependent variable, Y, from the in-
=
(10)
"K K([A]S + KS)2
dependent variable, X and assume
that the error in X is negligible. In
special cases where this is not the
"Y -Ä…S(KA])S
[
(11)
=
case, extensions of the standard re-
"K K([A]S + KS)2
gression techniques have been de-
veloped to account for nonnegligible
All derivatives involve at least two of
error in X.
the parameters, so the model is nonlinear.
4. The relationship between X and Y is
However, it can be seen that the partial
of the correct form, i.e., the expecta-
derivative in Equation (9) does not contain
tion function (linear or nonlinear
the parameter, Ä…. A linear regression of Y
model) is appropriate to the data being
on [A]S/(KS + [A]S) will thus allow the es-
fitted.
timation of Ä…. Because this last (linear) re-
5. The variability of values around the
gression is conditional on knowing the val-
line is Gaussian.
367
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
and expected frequencies of measures. Be-
6. The values of Y have constant vari-
cause regression analyses also involve the
ance. Assumptions 5 and 6 are often
determination of the best model estimates
violated (most particularly when the
of the dependent variables based on the
data has variance where the standard
experimentally observed dependent vari-
deviation increases with the mean) and
ables, it is quite common to see the function
have to be specifically accounted for
in modifications of the standard re- used to determine the best-fit of the model
parameters to the experimental data referred
gression procedures.
7. There are enough datapoints to pro- to as the  Ç2 function, and the procedure
9
vide a good sampling of the random referred to as  chi-square fitting.
error associated with the experimental
observations. In general, the minimum
number of independent points can be
B. Least Squares
no less than the number of parameters
being estimated, and should ideally be
significantly higher.
The most widely used method of pa-
rameter estimation from curve fitting is the
method of least squares. To explain the prin-
IV. HOW IT WORKS ciple behind least squares methods, we will
use an example, in this case the simple lin-
ear model. Theoretically, finding the slope,
A. Minimizing an Error Function
², and intercept, Ä…, parameters for a perfect
(Merit Function) straight line is easy: any two X,Y pairs of
points can be utilized in the familiar  rise-
over-run formulation to obtain the slope
The goal of both linear and nonlinear
parameter, which can then be inserted into
regression procedures is to derive the  best
the equation for the straight line to derive
fit of a particular model to a set of experi-
the intercept parameter. In reality, however,
mental observations. To obtain the best-fit
experimental observations that follow lin-
curve we have to find parameter values that
ear relationships almost never fall exactly
minimize the difference between the ob-
on a straight line due to random error. The
served experimental observations and the
task of finding the parameters describing
chosen model. This difference is assumed
the line is thus no longer simple; in fact, it
to be due to the error in the experimental
is unlikely that values for Ä… and ² defined
determination of the datapoints, and thus it
by any pair of experimental points will de-
is common to see the entire model-fitting
scribe the best line through all the points.
process described in terms of  minimiza-
This is illustrated in Figure 5; although the
tion of an error function or minimization
dataset appears to follow a linear relation-
9
of a  merit function.
ship, it can be seen that different straight
The most common representation
lines, each characterized by different slopes
( norm ) of the merit function for regres-
and intercepts, are derived depending on
sion models is based on the chi-square dis-
which two X,Y pairs are used.
tribution. This distribution and its associ-
What is needed, therefore, is a  com-
ated statistic, Ç2, have long been used in the
promise method for obtaining an objective
statistical arena to assess  goodness-of-fit
best-fit. We begin with our population model
with respect to identity between observed
[Equation (2)]:
368
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 5. All possible straight lines that can be drawn through a four-point dataset when only two
points are used to define each line.
FIGURE 6. A combination of zero and nonzero residuals. The dataset is the same as in Figure 5,
with only one of the lines now drawn through the points. The vertical distance of each point from
the line (indicated by the arrows) is defined as the  residual.
each experimental point, the closer the pre-
Y = Ä… + ²X + µ
dicted line will be to that point. However,
because of the error in the data (the µ term
and derive an equation that is of the same
in the population model), no prediction equa-
form:
tion will fit all the datapoints exactly and,
hence, no equation can make the residuals
Ć
Ć
v = Ä… + ² X (12)
all equal zero. In the example above, each
straight line will yield a residual of zero for
Ć
where v is the predicted response and Ä… two points, but a nonzero residual for the
Ć
and ² are the estimates of the population
other two points; Figure 6 illustrates this for
intercept and slope parameters, respectively.
one of the lines.
The difference between the response vari-
A best-fit compromise is found by mini-
able, Y, and its predictor, v, is called the
mizing the sum of the squares of the residu-
 residual and its magnitude is therefore a
als, hence the name  least squares. Math-
measure of how well v predicts Y. The
ematically, the appropriate merit function
closer the residual is to a value of zero for
can be written as:
369
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
met regarding the independence of errors and
N N
2
( ri 2
2
i i
Ç =
ìÅ‚ ÷Å‚ ìÅ‚ ÷Å‚
"ëÅ‚Y-fwX ,¸)öÅ‚ = "ëÅ‚ wi öÅ‚ (13) a Gaussian distribution of errors in the
íÅ‚ Å‚Å‚ íÅ‚ Å‚Å‚
i
i=1 i=1 data.8,9,11 Nonetheless, for extremely large
deviations due to outlier points, least squares
where Ç2 is the weighted sum of the squares procedures can fail in providing a sensible fit
of the residuals (ri) and is a function of the of the model to the data.
parameters (the vector, ¸), and the N Although the example used above was
datapoint, XiYi. The term, wi, is the statisti- based on a linear model, nonlinear least
cal weight (see below) of a particular squares follow the same principles as linear
datapoint, and when used, most often re- least squares and are based on the same
lates to the standard error of that point. For assumptions. The main difference is that the
standard (unweighted) least squares proce- sum-of-squares merit function for linear
dures such as the current example, wi equals models is well-behaved and can be solved
1. The least squares fit of the dataset out- analytically in one step, whereas for nonlin-
lined above is shown in Figure 7. Note that ear models, iterative or numerical proce-
the best-fit straight line yields nonzero re- dures must be used instead.
siduals for three of the four datapoints. Nev- In most common applications of the least
ertheless, the resulting line is based on pa- squares method to linear and nonlinear
rameter estimates that give the smallest models, it is assumed that the majority of
sum-of-squares of those residuals. the error lies in the dependent variable.
Why do we use the sum of the square of However, there can be circumstances when
the residuals and not another norm of the both X and Y values are attended by ran-
deviation, such as the average of the absolute dom error, and different fitting approaches
values of the residuals? Arguably, simply are warranted. One such approach has been
because of convention! Different norms of described by Johnson,12 and is particularly
deviation have different relative sensitivities useful for fitting data to nonlinear models.
to small and large deviations and conven- In essence, Johnson s method utilizes a form
tional usage suggests that sums of the square of the standard Ç2 merit function, given
residuals represent a sensible compromise.4,10 above, that has been expanded to include
The popularity of least squares estimators the  best-fit X value and its associated
may also be based on the fact that they are variance. The resulting merit function is then
relatively easy to determine and that they are minimized using an appropriate least squares
accurate estimators if certain assumptions are curve fitting algorithm.
FIGURE 7. The minimized least squares fit of the straight line model [Equation (2)] to the dataset
shown in Figures 5 and 6.
370
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
 correct set. Maximum likelihood calcula-
C. Nonleast Squares
tions work in the opposite direction, that is,
given a particular model with a particular
Cornish-Bowden11 has listed the mini-
set of parameters, maximum likelihood cal-
mal requirements for optimal behavior of
culations derive a probability for the data
the least squares method:
being obtained. This (calculated) probabil-
ity of the data, given the parameters, can
a. Correct choice of model.
also be considered to be the likelihood of
b. Correct data weighting is known.
the parameters, given the data.9 The goal is
c. Errors in the observations are indepen-
then to fit for a set of parameters that maxi-
dent of one another.
mize this likelihood, hence the term  maxi-
d. Errors in the observations are normally
mum likelihood, and the calculations at-
distributed.
tempt to find the regression that has the
e. Errors in the observations are unbi-
maximum likelihood of producing the ob-
ased (have zero mean).
served dataset. It has been pointed out that
there is no formal mathematical basis for
And we can add:
the maximum likelihood procedure and be-
cause maximum likelihood calculations are
f. None or the datapoints are erroneous
quite involved, they are not routinely uti-
(outliers).
lized explicitly.9 Fortunately the simpler least
squares methods described above are equiva-
Often, however, the requirements for
lent to maximum likelihood calculations
optimal behavior cannot be met. Other tech-
where the assumptions of linear and nonlin-
niques are available for deriving parameter
ear regression (particularly the independence
estimates under these circumstances, and
and Gaussian distribution of the errors in
they are generally referred to as  robust
the data) are valid.8,9,11
estimation or  robust regression tech-
Certain robust regression techniques
niques. Because the word  robustness has
focus on using measures of central tendency
a particular connotation, it is perhaps unfair
other than the mean as the preferred statis-
to class all of the diverse nonleast squares
tical parameter estimator. For instance,
procedures under the same umbrella. Over-
Cornish-Bowden11 has described how the
all, however, the idea behind robust estima-
median is more insensitive to outlier points
tors is that they are more insensitive to de-
in linear regression and certain cases of
viations from the assumptions that underlie
nonlinear regression than the mean. A draw-
the fitting procedure than least squares esti-
back of this approach, however, is that it
mators.
quickly becomes cumbersome when ex-
 Maximum likelihood calculations are
tended to more complex linear problems.
one class of robust regression techniques
that are not based on a Gaussian distribution
of errors. In essence, regression procedures
D. Weighting
attempt to find a set of model parameters
that generate a curve that best matches the
observed data. However, there is no way of
The simplest minimization functions
knowing which parameter set is the correct
make no distinction between different ex-
one based on the (sampled) data, and thus
perimental points, and assume that each
there is no way of calculating a probability
observation contributes equally to the esti-
for any set of fitted parameters being the
371
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
mation of model parameters. This is appro- scales with the mean, and thus the likely
priate when the variance of all the observa- error in each estimate is a constant percent-
tions is uniform, and the error is referred to age of counts rather than a constant value
as homoscedastic. However, in reality it is for any number of counts. Thus, a curve fit
common that different points have different allowing for relative weighting can adjust
variances associated with them with the re- for the resulting heteroscedastic variance.
sult that the points with the most variance Another useful weighting value is wi = Yi .
may have an undue influence on the param- This yields  weighting by 1/Y and is ap-
eters obtained from an unweighted curve propriate, for example, when most of the
fit. For example, results from many biologi- experimental uncertainty in the dependent
cal experiments are often expressed as a variable is due to some sort of counting
change from a baseline value, with the con- error.5 Other weighting schemes utilize the
sequence that the points near the baseline number of replicates that are measured for
become small numbers (near zero) with a each value of Y to determine the appropri-
low variance. Points representing larger re- ate weight for the datapoints.13
sponses will naturally have a larger vari-
ance, a situation that can be described as
heteroscedasticity. An unweighted curve fit
E. Regression Algorithms
through heteroscedastic data will allow the
resulting curve to deviate from the well-
What are the actual  mechanics that
defined (tight) near-zero values to improve
the fit of the larger, less well-defined val- underlie the Ç2 minimization process be-
hind least squares regression techniques?
ues. Clearly it would be better to have the fit
The Ç2 merit function for linear models
place more credence in the more reliably
(including polynomials) is quadratic in
estimated points, something that can be
nature, and is thus amenable to an exact
achieved in a weighted curve fit.
analytical solution. In contrast, nonlinear
Equation (13) was used previously to
problems must be solved iteratively, and
define the general, least squares, minimiza-
this procedure can be summarized as fol-
tion function. There are a number of varia-
lows:
tions available for this function that employ
differential data weighting.13 These func-
tions explicitly define a value for the wi a. Define the merit function.
b. Start with a set of initial estimates
term in Equation (13). For instance, if wi =
(guesses) of the regression param-
1 or a constant, then the weighting is said to
eters and determine the value of the
be  uniform ; if wi = Yi, then
merit function for this set of esti-
N
mates.
ri 2 N 1
2
Ç = (ri )2
ìÅ‚ ÷Å‚
"ëÅ‚ Yi öÅ‚ = "
c. Adjust the parameter estimates and re-
íÅ‚ Å‚Å‚ Yi2
i=1 i=1
calculate the merit function. If the merit
function is improved, then keep the
and the weighting is said to be  relative.
parameter values as new estimates.
Relative weighting is also referred to as
d. Repeat step c (each repeat is an  itera-
 weighting by 1/Y2 and is useful where the
tion ). When further iterations yield a
experimental uncertainty is a constant frac-
negligible improvement in the fit, stop
tion of Y. For example, counts of radioac-
adjusting the parameter estimates and
tive decay will have variances described by
generate the curve based on the last set
the Poisson distribution where the variance
of estimates.
372
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
The rules for adjusting the parameters initial iterations, where the likelihood of
of the nonlinear model are based on matrix finding a linear approximation to the merit
algebra and are formulated as computer al- function is decreased.
gorithms. The merit function can be viewed A method exploiting the best features of
as a multidimensional surface that has all the methods of steepest descent and Gauss-
possible sum-of-squares values as one plane Newton was described by Marquardt, based
and all possible values of each of the model on an earlier suggestion by Levenberg,9 and
parameters as the other planes. This surface the resulting algorithm is thus often referred
may thus vary from a smooth, symmetrical to as the Levenberg-Marquardt method.
shape to one characterized by many crests Marquardt realized that the size of the in-
and troughs. The role of the nonlinear re- crements in an interative procedure poses a
gression algorithm is to work its way down significant scaling problem for any algo-
this surface to the deepest trough that should rithm, and proceeded to refine the scaling
then correspond to the set of model param- issue and derive a series of equations that
eters that yield the minimum sum-of-squares can approximate the steepest descent method
value. at early iterations and the Gauss-Newton
There are a number of different algo- method at later stages closer to the mini-
rithms that have been developed over the mum. The Levenberg-Marquard method
years, and they all have their pros and cons. (sometimes simply referred to as the
One of the earliest algorithms is the method Marquardt method) has become one of the
of steepest descent (or the gradient search most widespread algorithms used for com-
method8). This method proceeds down the puterized nonlinear regression.
steepest part of the multidimensional merit Another type of algorithm that is geo-
function surface in fixed step lengths that metric rather than numeric in nature is the
tend to be rather small.9 At the end of each Nelder-Mead Variable Size Simplex
iteration, a new slope is calculated and the method.8,14 Unlike the methods outlined
procedure repeated. Many iterations are re- above, this method does not require the cal-
quired before the algorithm converges on a culation of any derivatives. Instead, this al-
stable set of parameter values. This method gorithm depends on the generation of a
works well in the initial iterations, but tends number of starting points, called  vertices,
to drag as it approaches a minimum value.13 based on initial estimates for each param-
The Gauss-Newton method is another eter of the model, as well as an initial incre-
algorithm that relies on a linear approxima- ment step. The vertices form a multidimen-
tion of the merit function. By making this sional shape called a  simplex. The
approximation, the merit function ap- goodness of fit is evaluated at each vertex in
proaches a quadratic, its surface becomes a the simplex, the worst vertex is rejected and
symmetrical ellipsoid, and the iterations of a new one is generated by combining desir-
the Gauss-Newton algorithm allow it to able features of the remaining vertices. This
converge toward a minimum much more is repeated in an iterative fashion until the
rapidly than the method of steepest descent. simplex converges to a minimum. The big
The Gauss-Newton method works best advantage of the Nelder-Mead method is
when it is employed close to the surface that it is very successful in converging to a
minimum, because at this point most merit minimum; its main disadvantage is that it
functions are well approximated by linear does not provide any information regarding
(e.g., quadratic) functions.9 In contrast, the the errors associated with the final param-
Gauss-Newton method can work poorly in eter estimates.8
373
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
V. WHEN TO DO IT VI. HOW TO DO IT
(APPLICATION OF CURVE
FITTING PROCEDURES)
A. Choosing the Right Model
A. Calibration Curves (Standard
1. Number of Parameters
Curves)
The expectation function should include
Calibration curves are most convenient
the minimum number of parameters that
when they are linear, but even for assays
adequately define the model and that allow
where a linear relationship is expected on
for a successful convergence of the fit.
theoretical grounds, nonlinear curves can
If a model is overparameterized, it is
result from instrumentation nonlinearities
considered to possess  redundant param-
and other factors. The equation of a curve
eters (often used interchangeably with the
fitted through the calibration data will al-
term  redundant variables ), and the regres-
low convenient conversion between the
sion procedure will either fail or yield mean-
raw measurement and the required value.
ingless parameter estimates. Consider the
In cases where there is no theoretical ba-
 operational model of Black and Leff.15
sis for choosing one model over another,
This is a model that is often used in pharma-
calibration curves can be considered to be
cological analyses to describe the concen-
a smoothing rather than a real fitting prob-
tration response relationship of an agonist
lem and one might decide to apply a poly-
(A) in terms of its affinity (dissociation
nomial model to the data because of the
constant) for its receptor (KA), its  opera-
availability of an analytical solution. In
tional efficacy (Ä), and the maximum re-
such a case the order of the chosen poly-
sponse (Em) that the tissue can elicit. One
nomial would need to be low so that noise
common form of the model is:
in the calibration measurements is not
converted into wobbles on the calibration Em Å"Ä Å"[A]
E =
(14)
curve. ([A] + KA) + Ä Å"[A]
where E denotes the observed effect. Figure
8 shows a theoretical concentration response
B. Parameterization of Data
curve, plotted in semilogarithmic space, that
(Distillation)
illustrates the relationship between the op-
erational model parameters and the maxi-
mal asymptote (Ä…) and midpoint location
It is often desirable to describe data in
(EC50) of the resulting sigmoidal curve. A
an abbreviated way. An example of this is
concentration response curve like the one
the need to summarize a concentration re-
in Figure 8 can be successfully fitted using
sponse curve into a potency estimate and
the two-parameter version of the Hill equa-
maximum response value. These parameters
tion, which describes the curve in terms of
are easily obtained by eyeballing the data,
only the EC50 and Ä… (the slope being equal
but an unbiased estimate from an empirical
to 1):
curve fit is preferable and probably more
acceptable to referees! Ä… Å"[A]
E =
(15)
[A] + EC50
374
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 8. The relationship between the Hill equation [Equation (15)] parameters, a and EC50, and
the operational model [Equation (14)] parameters KA, t, and Em, in the description of a concentra-
tion response curve of an agonist drug. It can be seen that each parameter of the Hill equation is
composed of two operational model parameters.
However, it can be seen in Figure 8 that ture is needed to find examples where the
the midpoint and maximal asymptote of the curve and the data have disparate shapes!
curve are related to the operational model in Empiricism allows one a great deal of free-
a more complicated manner; each param- dom in choosing models, and experiment-
eter of the sigmoidal Hill equation is com- ers should not be overly shy of moving
prised of two operational model parameters. away from the most common models (e.g.,
If someone were to try directly fitting Equa- the Hill equation) when their data ask for it.
tion (14) to this curve in order to derive Even for mechanistic models it is important
individual estimates of Em, KA, and Ä, they to look for a clear shape match between the
would be unsuccessful. As it stands, the model and data: a marked difference can
operational model is overparameterized for only mean that the model is inappropriate or
fitting to a single curve; the regression algo- the data of poor quality.
rithm simply will not be able to apportion Perhaps the only feature that practi-
meaningful estimates between the individual cally all biological responses have in com-
operational model parameters as it tries to mon is that they can be approximated by
define the midpoint and maximal asymptote nonlinear, saturating functions. When plot-
of the concentration response curve. In prac- ted on a logarithmic concentration scale,
tice, the successful application of the opera- responses usually lie on a sigmoid curve,
tional model to real datasets requires addi- as shown in Figures 71 and 8, and a num-
tional experiments to be incorporated in the ber of functions have been used in the past
curve fitting process that allow for a better to approximate the general shape of such
definition of the individual model param- responses. Parker and Waud,18 for instance,
eters.16,17 have highlighted that the rectangular hy-
perbola, the integral of the Gaussian distri-
bution curve, the arc-tangent, and the lo-
2. Shape
gistic function have all been used by various
researchers to empirically fit concentra-
tion response data. Some of these func-
When fitting empirical models to data
tions are more flexible than others; for in-
the most important feature of the model
stance, the rectangular hyperbola has a fixed
must be that its shape should be similar to
slope of 1. In contrast, the logistic equation
the data. This seems extraordinarily obvi-
has proven very popular in the fitting of
ous, but very little exploration of the litera-
concentration response data:
375
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
ally consider each of the parameters to be
1
E =
(16) responsible for a single property of the curve.
1 + e-(Ä… +²X)
Thus, in the Hill equation, there is a slope
Part of the popularity of this equation parameter, S, a parameter for the maximum
is its flexibility and its ability to match the asymptote (Ä…), and a parameter for the loca-
parameters of the Hill equation [Equation tion (K or EC50). Ideally, each of these pa-
(1)] for empirical fitting purposes. rameters would be entirely independent so
In general, the correct choice of expecta- that error or variance in one does not affect
tion function is most crucial when fitting the values of the others. Such a situation
mechanistic models. The difficulty in ascer- would mean that the parameters are entirely
taining the validity of the underlying model uncorrelated. In practice it is not possible to
in these cases arises because the curve fitting have uncorrelated parameters (see Figure
process is undertaken with the automatic as- 9), but the parameters of some functions are
sumption that the model is a plausible one less correlated than others. Strong correla-
prior to actually fitting the model and apply- tions between parameters reduce the reli-
ing some sort of diagnostics to the fit (see ability of their estimation as well as making
Assessing the Quality of the Fit, below). We any estimates from the fit of their variances
must always remain aware, therefore, that overly optimistic.19
we will never really know the  true model,
but can at least employ a reasonable one that
4. Distribution of Parameters
accommodates the experimental findings and,
importantly, allows for the prediction of test-
able hypotheses. From a practical standpoint,
The operational model example can also
this may be seen as having chosen the  right
be used to illustrate another practical con-
mechanistic model.
sideration when entering equations for curve
fitting, namely the concept of
 reparameterization. 13 When fitting the
3. Correlation of Parameters
operational model or the Hill equation to
concentration response curves, the param-
When a model, either mechanistic or eters may be entered in the equation in a
empirical, is applied to a dataset we gener- number of ways; for instance, the EC50 is
FIGURE 9. Altered estimates of the maximal asymptote, a, and the slope, S, obtained by fitting the
Hill equation to logistic data where the parameter K (log K) was constrained to differ from the correct
value. The systematic relationship between the error in K and the values of the parameters S and
Ä… indicates that each is able to partially correct for error in K and thus are correlated with K.
376
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
commonly entered as 10LogEC50. This ity to ascribe a high degree of appropriate-
reparameterization means that the regres- ness to the resulting curve fit becomes para-
sion algorithm will actually provide the best- mount.
fit estimate of the logarithm of the EC50.
Why reparameterize? As mentioned earlier,
1. Inspection
many of the assumptions of nonlinear re-
gression rely on a Gaussian distribution of
experimental uncertainties. Many model
Although usually sufficient for empiri-
parameters, including the EC50 of the Hill
cal models, an initial test for conformity of
equation, the dissociation constant of a hy-
the data to any selected model is a simple
perbolic radioligand binding equation, and
inspection of the curve fit superimposed on
the Ä parameter of the operational model,
the data. Although rudimentary, this proce-
follow an approximately Gaussian distribu-
dure is quite useful in highlighting really
tion only when transformed into loga-
bad curve fits, i.e., those that are almost
rithms.17 Thus, although not particularly
invariably the consequence of having inad-
important for the estimation of the paramet-
vertently entered the wrong equation or set-
ric value, reparameterization can improve
ting certain parameter values to a constant
the validity of statistical inferences made
from nonlinear regression algorithms.13 value when they should have been allowed
to vary as part of the fitting process. Assum-
Other examples of reparameterizations that
ing that visual inspection does not indicate
can increase the statistical reliability of the
a glaring inconsistency of the model with
estimation procedure include recasting time
the data, there are a number of statistical
parameters as reciprocals and counts of ra-
procedures that can be used to quantify the
dioactive decay as square roots.5
goodness of the fit.
B. Assessing the Quality of the
2. Root Mean Square
Fit
Figure 10 shows a schematic of an ex-
The final determination of how  appro-
perimental dataset consisting of 6 observa-
priate the fit of a dataset is to a model will
tions (open circles labeled obs1  obs6) and
always depend on a number of factors, in-
the superimposed best-fit of a sigmoidal
cluding the degree of rigor the researcher
concentration response model [Equation
actually requires. Curve fitting for the de-
(15)] to the data. The solid circles (exp1 
termination of standard curves, for instance,
exp6) represent the expected response cor-
will not warrant the same diagnostic criteria
responding to each X-value used for the
one may apply to a curve fit of an experi-
determination of obs1  obs6, derived from
mental dataset that was designed to investi-
the model fit. The sum of the squared re-
gate a specific biological mechanism. In the
siduals, i.e., the sum of the squared differ-
case of standard curves, an eyeball inspec-
ences between the observed and expected
tion of the curve superimposed on the data
responses has also been defined as the Er-
is usually sufficient to indicate the reliabil-
ror Sum of Squares (SSE), and it is this
ity of the fit for that specific purpose. How-
quantity that most researchers think of when
ever, when the fitting of models to experi-
discussing the sum-of-squares derived from
mental data is used to provide insight into
their curve fitting exercises [see Equation
underlying biological mechanisms, the abil-
(13)]:
377
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 10. Relationship between a set of experimental observations (open circles; obs1  obs6)
and their corresponding least squares estimates (solid circles; exp1  exp6). The horizontal dashed
line represents the average of all the experimental observations (obsav).
dard deviation or  standard error of the
SSE = (obs1  exp1)2 + (obs2  exp2)2 + & model, this should not be confused with the
+ (obs6  exp6)2 (17) standard deviation or error associated with
the individual parameter estimates. The de-
The SSE is sometimes used as an index gree of uncertainty associated with any
of goodness-of-fit; the smaller the value, model parameter is derived by other meth-
the better the fit. However, in order to use ods (see below).
this quantity more effectively, an allowance
must also be made for the  degrees of free-
dom of the curve fit. For regression proce- 3. R2 (Coefficient of Determination)
dures, the degrees of freedom equal the to-
tal number of datapoints minus the number
Perhaps more common than the RMS,
of model parameters that are estimated. In
the R2 value is often used as a measure of
general, the more parameters that are added
goodness of fit. Like the r2 value from
to a model, the greater the likelihood of
linear regression or correlation analyses,
observing a very close fit of the regression
the value of R2 can range from 0 to 1; the
curve to the data, and thus a smaller SSE.
closer to 1 this value is, the closer the
However, this comes at the cost of degrees
model fits the dataset. To understand the
of freedom. The  mean square error (MSE)
derivation of R2, it is important to first
is defined as the SSE divided by the degrees
appreciate the other  flavors of sums-of-
of freedom (df):
squares that crop up in the mathematics of
SSE
regression procedures in addition to the
MSE =
(18)
df
well-known SSE.
Using Figure 10 again as an example,
Finally, the square root of MSE is equal to
the sum of the squared differences between
the root mean square, RMS:
each observed response and the average of
all responses (obsav) is defined as the Total
SSE
(19)
RMS =
Sum of Squares (SST; sometimes denoted
df
as Syy):
The RMS (sometimes referred to as Sy.x)
is a measure of the standard deviation of the
SST = (obs1  obsav)2 + (obs2  obsav)2 + &
residuals. It should be noted, however, that
+ (obs6  obsav)2 (20)
although RMS is referred to as the  stan-
378
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
where of the squares of the residuals, it is not
surprising that methods are available for
obsav = (obs1 + obs2 + obs3 + obs4 + obs5 analyzing the final residuals in order to as-
+ obs6)/6 (21) sess the conformity of the chosen model to
the dataset. The most common analysis of
The sum of the squared differences be- residuals relies on the construction of a scat-
tween each estimated (expected) response, ter diagram of the residuals.13,20 Residuals
based on the model, and the average of all are usually plotted as a function of the val-
observed responses is defined as the Re- ues of the independent variable. If the model
gression Sum of Squares (SSR): is adequate in describing the behavior of the
data, then the residuals plot should show a
SSR = (exp1  obsav)2 + (exp2  obsav)2 + random scatter of positive and negative re-
& + (exp6  obsav)2 (22) siduals about the regression line. If, how-
ever, there is a systematic deviation of the
The total sum of squares, SST, is equal data from the model, then the residuals plot
to the sum of SSR and SSE, and the goal of will show nonrandom clustering of positive
regression procedures is to minimize SSE and negative residuals. Figure 11 illustrates
(and, as a consequence, SST). this with an example of a radioligand com-
Using the definitions outlined above, petition binding experiment. When the data
the value of R2 can be calculated as fol- are fitted to a model of binding to a single
lows:5,10 site, a systematic deviation of the points
from the regression curve is manifested as
SSR SSE
R2 = = 1 -
(23) clustering in the residuals plot. In contrast,
SST SST
when the same dataset is fitted to a model of
R2 is the proportion of the adjusted variance binding to two sites, a random scatter of the
in the dependent variables that is attributed residuals about the regression line indicates
to (or explained by) the estimated regres- a better fit of the second model. This type of
sion model. Although useful, the R2 value is residual analysis is made more quantitative
often overinterpreted or overutilized as the when used in conjunction with the  runs
main factor in the determination of good- test (see below).
ness of fit. In general, the more parameters There are many other methods of per-
that are added to the model, the closer R2 forming detailed analyses of residuals in
will approach a value of 1. It is simply an addition to the common method described
index of how close the datapoints come to above. These methods include cumulative
the regression curve, not necessarily an in- probability distributions of residuals, Ç2 tests,
dex of the correctness of the model, so while and a variety of tests for serial correla-
R2 may be used as a starting point in the tion.7,10,11,20
assessment of goodness of fit, it should be
used in conjunction with other criteria.
5. The Runs Test
4. Analysis of Residuals
The runs test is used for quantifying
trends in residuals, and thus is an additional
Because the goal of least squares re- measure of systematic deviations of the
gression procedures is to minimize the sum model from the data. A  run is a consecu-
379
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 11. An example of residuals plots. The top panel represents a curve fit based on a one
binding site model to a data set obtained from a radioligand competition binding assay (left) and
its corresponding residuals plot (right). Note the clustering of positive and negative residuals. The
bottom panel represents a curve fit based on a two binding site model to the same dataset (left)
and its corresponding residuals plot (right). Note the random scatter of positive and negative
residuals in this case.
tive series of residuals of the same sign
C. Optimizing the Fit
(positive or negative). The runs test involves
a calculation of the expected number of
With the ubiquitous availability of pow-
runs, given the total number of residuals
erful computers on most desktops, the im-
and expected variance.20 The test uses the
pressive convergence speed of modern curve
following two formulae:
fitting programs can often lead to a false
sense of security regarding the reliability of
2NpNn
the resulting fit. Assuming that the appro-
Expected Runs = + 1
(24)
Np + Nn
priate model has been chosen, there are still
a number of matters the biomedical investi-
2NpNn(2NpNn - Np - Nn )
gator must take into account in order to
Expected Variance =
(Np + Nn )2(Np + Nn -1) ensure that the curve fitting procedure will
be optimal for their dataset.
(25)
where Np and Nn denote the total number of
1. Data Transformations
positive and negative residuals, respectively.
The results are used in the determination of
a P value.5,13 A low P value indicates a sys- Most standard regression techniques as-
tematic deviation of the model from the data. sume a Gaussian distribution of experimen-
In the example shown in Figure 11, the one- tal uncertainties and also assume that any
site model fit was associated with a P value errors in Y and X are independent. As men-
of less than 0.01 (11 runs expected, 4 ob- tioned earlier, however, these assumptions
served), whereas the two-site model gave a P are not always valid. In particular, the vari-
value of 0.4 (10 runs expected, 9 observed). ance in the experimental dataset can be
380
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
heteroscedastic, that is, it changes in a sys- rithm to converge or, more insidiously, a
tematic fashion with the variables. One convergence of the curve fit on a  local
method for optimizing the curve fitting pro- minimum. If we recall our earlier discus-
cess to adjust for heteroscedastic errors is to sion of the surface of the merit function that
weight the data, as discussed earlier, while the various algorithms travel down, it is
another approach is to transform the data to possible to envisage a multiparameter model
a form where the errors become more that results in a series of troughs such that
homoscedastic prior to the application of the the algorithm may settle in one as if it has
regression technique. Transformations such converged on the best fit when, in fact, a
as the square root or logarithm of the depen- deeper trough is available elsewhere on the
dent or independent variables do not neces- merit function surface. This is an example
sarily cause any problems of their own, pro- of the program converging on a local mini-
vided they reduce rather than increase any mum (Figure 12), where the curve fit is not
heteroscedasticity in the data. In contrast, optimal although the user may think that the
classical  linearising transformations, where best fit has been obtained. The best safe-
a new variable is derived from both the origi- guard against this problem is to perform the
nal dependent and independent variables, are regression analysis a number of times using
quite dangerous and it is unfortunate that different initial estimates. A well-behaved
they are still common practice in some labo- model should converge on essentially the
ratories. Indiscriminate data transforms of same final estimates each time.
the latter kind are troublesome because they Some commercial programs make the
have the potential of distorting homoscedastic process of finding initial parameter estimates
errors in experimental uncertainties and thus relatively painless by incorporating approxi-
violating the assumptions of any subsequent mate rules that find initial estimates for the
regression procedure. Transforms are appro- user. Although this is expedient, there is no
priate if they have a normalizing effect on substitute for the researcher personally ad-
heteroscedastic errors; they are not valid oth- dressing the issue of initial parameter esti-
erwise. In addition, some data transforms, mates. This forces one to focus on the un-
embodied in reciprocal plots (e.g., derlying model and the meaning of the model
Lineweaver-Burk) or the Scatchard transfor- parameters, and it is then not too difficult to
mation, violate the assumption of indepen- come up with a best guess. If further assis-
dence between X and Y variables and are tance is required, or if there are some pa-
equally inappropriate. In contrast, transfor- rameters that the user does not have a par-
mation of model parameters (as described ticular  feel for, then a simplex algorithm
earlier) may often have an optimising effect or a Monte Carlo-based algorithm (see be-
on the fitting procedure. low) may be utilized to derive estimates that
can subsequently be improved upon by the
more standard derivative-based algorithms.
2. Initial Estimates
D. Reliability of Parameter
All curve fitting algorithms require the
Estimates
specification of initial estimates of the pa-
rameters that are then optimized to yield the
The determination of the reliability of
best fit. No regression algorithm is perfect,
the estimated parameters derived from a
and failure to specify reasonable parameter
curve fit is as important as the actual esti-
estimates may result in a failure of the algo-
381
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 12. Multiple minima in parameter space. The best fit is obtained at that set of parameter
values yielding the smallest possible sum of squares. Depending on the initial estimates, however,
the fitting algorithm may converge on parameter sets which, although yielding a reduced sum of
squares, do not correspond to the minimum possible sum of squares. The regression is then said
to have converged on a  local minimum.
ber is also warranted at this point, since it
mation of the parametric values themselves.
All known methods for the calculation of can form an important component of choos-
standard errors and confidence intervals ing the right model, that adequately accounts
from regression algorithms are based on for the data. Figure 13 illustrates the effect
the mathematics of linear models. Since of datapoint number on one of the most
nonlinear models are more common in bi- common statistical procedures utilized in
ology than linear models, it is perhaps dis- discriminating between variants of the same
heartening to have to accept that there are model, i.e., the  F-test (or  extra-sum-of-
no exact theories for the evaluation of para- squares test). The actual test is described
metric errors in nonlinear regression. How- in greater detail in the next section. For
ever, there are a number of procedures now, it is sufficient to point out that the F-
available for approximating these errors test relies heavily on the degrees of freedom
such that, in most practical applications, a associated with the fit to any model, which
reasonable measure of parameter error is are in turn dependent on the number of
obtained. datapoints minus the number of parameters
estimated. Although all the points in each of
the panels in Figure 13 are taken from the
1. Number of Datapoints
same simulated dataset, the  correct model
(a two binding site model) can only be sta-
tistically resolved when the datapoints were
The number of experimental datapoints
increased from 6 (panel A) or 10 (panel B),
collected and analyzed will play a crucial
to 20 (panel C).
role in the curve fitting process in one (or
Assuming that the researcher has a
both) of two ways:
priori reasons for deciding that a particu-
lar model is most appropriate under their
a. Determination of the appropriateness
circumstances, the number of datapoints
of the model.
will still be crucial in determining the
b. Determination of the accuracy of the
accuracy of the parameters based on that
parameter estimates.
model. Table 1 lists the parameter esti-
mates and corresponding 95% confidence
Different measures for goodness-of-fit
intervals of a two binding site model (i.e.,
have already been covered, but some dis-
the correct model) applied to the datasets
cussion on the influence of datapoint num-
382
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 13. Influence of data point number on choice of model. The radioligand competition
binding curves above were simulated (with random error) according to a model for binding to two-
sites. The sampling of points in each of the panels is from exactly the same simulated dataset. The
curves in each panel are the least squares fit of the data to either a one- or two-site binding model,
as determined by an F-test (see Section VI. E). Panels A (6 points) and B (10 points) were not
statistically significant from a one-site model. Only in panel C (20 points) were the data able to be
statistically resolved into the (correct) two-site model fit.
rived by the computer program from the
of Panel A and Panel C, respectively, of
fitting algorithm and are most likely un-
Figure 13. Although the final parameter
estimates appear comparable in each in- derestimates of the true error (see below),
thus rendering our (already shaken) con-
stance, the fit based on the small number
of datapoints is associated with unaccept- fidence in the accuracy
of minimal-data-point parameter esti-
ably large confidence intervals. There are
mates virtually nonexistent. There have
simply not enough points to accurately
been some methods presented in the lit-
define all the parameters of the model. In
erature for maximizing the reliability of
contrast, increasing the number of
parameter estimates under conditions of
datapoints to 20 allowed for reasonable
minimal datapoint number (e.g., Refer-
estimates of the error associated with each
parameter estimate. The confidence inter- ences 21 and 22), but there really is no
substitute for a good sampling of experi-
vals reported in the table were calculated
from the asymptotic standard errors de- mental datapoints.
383
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
Table 1
Parameter Estimates and Associated Confidence Intervals from Fitting a Two-Site Model
of Radioligand Competition Binding to Different Data Point Numbers Taken from the
Same Dataset (Panels A and C; Figure 13)
Parameter Estimate 95% Confidence Interval
Datapoints = 6
Maximum Asymptotea  96.7 45.6 to 148.3
Minimum Asymptoteb 1.3  84.8 to 87.56
Log IC50Highc  6.7  9.93 to  3.53
Log IC50Lowd  5.1  16.1 to 5.9
Fraction Highe 0.74  1.4 to 2.9
Datapoints = 20
Maximum Asymptote 99.9 95.4 to 104.4
Minimum Asymptote 0.9  5.5 to 7.4
Log IC50High  6.8  7.3 to  6.5
Log IC50Low  5.6  6.4 to  4.6
Fraction High 0.64 0.4 to 0.8
a
Y-axis value in the absence of competing drug.
b
Y-axis value in the presence of saturating concentrations of competing drug.
c
Potency estimate for competition at the high affinity binding site.
d
Potency estimate for competition at the low affinity binding site.
e
Fraction of high affinity binding sites.
ences between experimental treatments, as
2. Parameter Variance Estimates
it is based on small sample sizes and, fur-
from Repeated Experiments
thermore, does not utilize all the available
datapoints. The remaining methods for
The most straightforward and conser-
parameter error estimation utilize all the
vative approach to building up an error
datapoints in some form or other.
profile of a given parameter is to simply
repeat the same experiment many times,
obtain single parameter estimates from each
3. Parameter Variance
individual curve fit, and then derive the
Estimates from Asymptotic
mean and standard deviation (and error) of
Standard Errors
the parameters using standard textbook
methods. Assuming that each curve fit is
The standard errors reported by practi-
performed under optimal conditions, e.g.,
cally all commercially available least squares
appropriate number of datapoints, appro-
regression programs fall under this category.
priate transformation and weighting, etc.,
Asymptotic standard errors are
biomedical research is still fraught with
computationally the easiest to determine and,
small overall sample sizes; it is not uncom-
perhaps not surprisingly, the least accurate.
mon to see n = 3  6 given in many publi-
In most instances, these standard errors will
cations as the number of times an experi-
underestimate the true error that is likely to
ment is repeated. As such, the conservative,
be associated with the parameter of interest.
albeit straightforward approach to param-
The calculation of the asymptotic stan-
eter error estimation just described may
dard error and associated confidence inter-
not have the power to resolve small differ-
384
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
vals involves matrix algebra, but may be As such, ignoring the covariances from the
summarized as follows:23 variance-covariance matrix in the reporting
of parameter errors will underestimate the
1. Determine the Hessian (or  informa- true error.
tion ) matrix. This is the matrix con-
Nevertheless, asymptotic standard er-
taining the second derivatives of the
rors may serve a useful diagnostic role. Since
parameters with respect to the mini-
they will invariably be underestimates of
mized Ç2 merit function.
the true error, very large standard errors or
2. Evaluate the variance-covariance ma-
confidence intervals reported after a curve
trix by multiplying the inverse of the
fit are indicative of a very poor fit of the
Hessian matrix by the variance of the
associated parameter (see Table 1). This
residuals of the curve fit.
may occur, for instance, because the param-
3. The diagonal elements of the resulting
eter is ill defined by the available data.
variance-covariance matrix are the
squares of the asymptotic standard er-
rors; the off-diagonal elements of the
4. Monte Carlo Methods
matrix are the covariances of the pa-
rameters, and are a measure of the
The most reliable method for the deter-
extent to which the parameters in the
mination and validation of model parameter
model are correlated with one another.
confidence intervals is also the most com-
puter-intensive. Monte Carlo simulations in-
The computer program then reports the
volve the generation of multiple (hundreds to
resulting standard errors. For these errors to
thousands) of pseudodatasets, based on a
actually be a good measure of the accuracy
chosen model, and the subsequent analysis
of the parameter estimates, the following
of the simulated datasets with the same model
assumptions must hold:23
used to generate them followed by construc-
a. The fitting equation is linear. tion of a frequency histogram showing the
b. The number of datapoints is very large. distribution of parameter estimates.17,24 Fig-
c. The covariance terms in the variance- ure 14 shows a flowchart summarizing the
covariance matrix are negligible.
general approach to Monte Carlo simulation.
The crucial factor in the implementation
For nonlinear models, the first assump-
of the Monte Carlo approach is the ability to
tion is invalid, however, the impact of fail-
add random  error to the pseudodataset
ure to conform to this assumption may be
points that accurately reflects the distribution
lessened for models that are well behaved,
of experimental uncertainties associated with
e.g., contain conditionally linear parameters
the determination of  real datasets. The best
or can be approximated by linear functions.
determinant of this error is the variance of
The second assumption can also be reason-
the fit of the chosen model to real experimen-
able provided the experimenter is able to
tal data, provided that the standard assump-
ensure an adequate sampling of datapoints.
tions underlying least squares regression
Unfortunately, the third assumption is al-
analyses are valid. In addition to the appro-
most never realized. As described earlier,
priate choice of variance for the simulations,
most parameters in nonlinear models show
other key features in this approach are the
some degree of correlation with one an-
choice and the number of independent vari-
other; indeed, high correlations are indica-
ables, which again should match those deter-
tive of parameter redundancies in the model.
mined in a typical experiment.
385
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
FIGURE 14. A general approach to Monte Carlo simulation.
The beauty of the Monte Carlo approach parameter error estimation, Monte Carlo
is that the level of accuracy with regard to methods can also guide the experimenter
the confidence interval profiles is very much toward the most appropriate model
in the hands of the researcher; the greater reparameterizations in order to optimize the
the number of simulated datasets, the greater actual curve fitting procedure.17
the resolution of the confidence intervals. One potential problem with the stan-
However, this comes at the expense of com- dard Monte Carlo approach is that it is nec-
puter time; a Monte Carlo simulation of essary to define the population distributions
1000 datasets may take 1000 times longer for the errors applied to the datapoints. A
than a least squares fit of the actual experi- normal distribution is most commonly used,
mental dataset used to pattern the simula- but it is not always clear that it is appropri-
tions. Coupled with the fact that many com- ate. The bootstrap, described below, explic-
mercially available curve fitting packages itly overcomes that problem.
do not contain Monte Carlo-compatible pro-
gramming features, the time factor involved
5. The Bootstrap
in generating parameter confidence inter-
vals from the Monte Carlo approach dis-
suades many researchers from routinely
 Bootstrapping is an oddly-named pro-
using this method. Nonetheless, great in-
cess that allows an approximate reconstruction
sight can be gained from Monte Carlo ap-
of the parameters of the population from which
proaches. For instance, in addition to pro-
the data have been (at least conceptually)
viding the greatest degree of accuracy in
386
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
sampled.25 Bootstrapping differs from standard tion of multidimensional grids based on
Monte Carlo methods in that it makes no as- model parameter values and then  search-
sumption about the form of the population, and ing for those parameter value combina-
instead assumes that the best estimate of the tions where the variance of the overall fit
properties of the population is the experimen- increases significantly. The confidence in-
tally determined dataset. The population is re- tervals are then defined as those regions of
constructed by repeated resampling of the the grid (which resemble a multidimensional
datapoints to give a large number (hundreds or ellipsoid) that surround the minimum over
even thousands) of new pseudodatasets. The which the variance does not change signifi-
resampling is done  with replacement, which cantly.8,23
is to say that any particular real datapoint can
appear in each pseudodataset more than one
7. Evaluation of Joint
time. The result is a population of pseudodatasets
Confidence Intervals
that represents a pseudo-population that has
approximately the same properties as the origi-
nal population.
As discussed earlier, the parameters
Bootstrapping can be used in several
in most models tend to show some corre-
ways relevant to model fitting. First, it can
lation with one another. The evaluation of
provide a pseudopopulation of any param-
joint confidence intervals is a procedure
eter calculable from each pseudodataset.
that is designed to include the covariance
Thus it can be used to give confidence inter-
of the parameters in the determination of
vals for fitted parameters obtained from
parameter error estimates.8,13 The equa-
methods that do not directly provide esti-
tions underlying this approach, however,
mates of parameter variance, such as the
assume that the fitting equation is linear
simplex method. Similarly, it has been used
in order to derive a symmetrical ellipti-
to estimate the reliability of the variance
cally-shaped confidence interval profile
estimates obtained from other methods that
of parameters. Unfortunately, this method
rely on the covariance matrix.19
yields asymmetric confidence regions for
Bootstrapping is not without poten-
those nonlinear models that cannot ap-
tial problems. One arises from the fact
proximate to a linear model, and is thus
that the real dataset is unlikely to include
not as reliable as Monte Carlo or Grid
any samples from the extreme tails of the
search methods.
overall population of possible datapoints.
This means that bootstrapped populations
generally have less area under the ex-
E. Hypothesis Testing
treme tails than the real population from
which the data were sampled. There are
corrections that can be applied,25 but
Often, the desire to ascertain the stan-
bootstrapping is not universally accepted
dard error or confidence interval associ-
by statisticians.
ated with model parameters is a prelude to
the statistical testing of the parameters ac-
cording to a particular hypothesis. There-
6. Grid Search Methods
fore, some objective statistical test is re-
quired in order to allow for comparisons
Another computer-intensive approach to between parameters or comparisons be-
error determination involves the construc- tween models.
387
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
in a significant difference in the model pa-
1. Assessing Changes in a
rameters between the two datasets.
Model Fit between Experimental
Treatments
2. Choosing between Models
There are three broad approaches to
performing statistical comparisons be-
The F ratio can also be used to compare
tween the same model parameters before
the fit of a single dataset to two different
and after an experimental treatment. The
versions of the same model:
first relies on the use of standard paramet-
ric tests, such as the Student s t-test. The
(SS1 - SS2) / (df1 - df 2)
(27)
F =
second approach relies on more computer-
SS2 / df 2
intensive, but preferable comparisons be-
In this instance, SS1 and df1 are defined
tween parameters based on permutation
as the SSE and degrees of freedom, respec-
tests. The third approach differs from the
tively, of the model with fewer parameters,
other two in that it uses all the experimen-
whereas SS2 and df2 are defined as the SSE
tal data generated before and after a par-
and degrees of freedom, respectively, of the
ticular treatment in a comparison of glo-
model with the greater number of param-
bal changes in goodness of fit. The last
eters. The addition of more parameters to a
procedure may be summarized as fol-
model will result in an improvement of the
lows.5,26
goodness of fit and a reduction in SSE, but
at the cost of degrees of freedom. The F test
1. Analyze each dataset separately.
[Equation (27)] attempts to quantify whether
2. Sum the SSE resulting from each fit to
the loss of degrees of freedom on going
give a new  total sum-of-squares
from a simpler to a more complicated model
value (SSA). Similarly, sum the two
degrees of freedom values from each is worth the gain in goodness of fit. A low
fit to give a  total degrees of freedom P value is indicative of the more compli-
(dfA). cated model being the statistically better
3. Pool the two sets of data into one large
model. It should be noted, however, that the
set.
F test can only be applied to two different
4. Analyze this new  global dataset to
versions of the same model, e.g., a one bind-
obtain a new sum-of-squares value
ing-site versus a two binding-site curve fit.
(SSB) and degrees of freedom (dfB).
In addition, the F test is particularly harsh
5. Calculate the following F ratio:
since it relies so heavily on degrees of free-
dom and, hence, datapoints and number of
(SSB - SSA) / (dfB - dfA)
(26)
F =
parameters. As a consequence, the test may
SSA / dfA
be too conservative and reject the more
The F value is used to obtain a P value,
complicated model for the simpler one, even
with the numerator having (dfB  dfA) de-
when this is not the case. Thus, results from
grees of freedom and the denominator hav-
the test should be regarded with caution if
ing dfA degrees of freedom. A small P value
the number of datapoints is limited and other
(i.e., large F value) indicates that the indi-
measures of goodness of fit appear to indi-
vidual fits are better than the global, pooled
cate that the simpler model is not a reason-
fit, i.e., the experimental treatment resulted
able fit to the data. When in doubt, repeat
388
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
the experiment with greater numbers of standard curve for extrapolation, then the
datapoints. nature of the underlying model and accom-
panying regression technique is not crucial.
If, however, the purpose of the curve fitting
procedure is to obtain insight into the fea-
VII. FITTING VERSUS SMOOTHING
tures of the model that describe an aspect of
the biological system of interest, then the
Throughout this article, the process of choice of model is paramount. Although
fitting empirical or mechanistic models to linear models can give curved lines, (e.g.,
experimental data has generally been encom- the polynomial equations described earlier),
passed within the umbrella term  curve fit- most biological experiments that yield data
ting. However, some distinctions can be described by a curve are probably best ana-
made. Simulation refers to the process lyzed using nonlinear regression. This is
whereby the properties of the model are ex- because it is much more common to find a
amined in order to determine the theoretical nonlinear model that can be related in a
consequences of imposing specified condi- meaningful and realistic fashion to the sys-
tions on the parameters and variables. The tem under study than a general linear model.
term fitting refers to the process whereby the
model parameters are altered to discover
which set of parameter values best approxi-
VIII. CONCLUSION
mate a set of experimental observations de-
rived from the actual system of interest. A
special case of the fitting process is the pro- Computerized curve fitting has become
cedure known as smoothing, whereby a model nearly ubiquitous in the analysis of bio-
is chosen to generate a fit that simply passes medical research. The ease of use and speed
near or through all the experimental datapoints of the modern curve fitting programs en-
in order to act as a guide for the eye. courage researchers to use them routinely
If the purpose of the curve fitting proce- for obtaining unbiased parameter estimates
dure is simply to smooth or to generate a where in the not very distant past, they might
Table 2
Selected List of Commercially-Available Curve Fitting Programs and Their Associated
Least Squares Algorithms. Distributors are listed in parentheses
Program Algorithm
Enzfitter (Biosoft) Levenberg-Marquardt; Simplex
Excel (Microsoft) Simplex
Fig. P (Biosoft) Levenberg-Marquardt
Kaleidagraph (Synergy) Levenberg-Marquardt
KELL (Biosoft) Levenberg-Marquardt
Origin (Microcal) Levenberg-Marquardt
Prism (GraphPad) Levenberg-Marquardt
ProFit (QuantumSoft) Levenberg-Marquardt; Robust; Monte Carlo (Simplex)
Scientist (Micromath) Levenberg-Marquardt; Simplex
SigmaPlot (SPSS) Levenberg-Marquardt
389
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
have used eyeballing or linearization pro- 4. Motulsky, H.J., Intuitive Biostatistics,
Oxford University Press, New York, 1995.
cesses that would have contained substan-
tial subjective elements and systematic dis-
5. Motulsky, H.J., Analyzing Data with
tortions. Nevertheless, indiscriminate use of
GraphPad Prism, GraphPad Software Inc.,
curve fitting without regard to the underly-
San Diego, CA, 1999.
ing features of the model and data is a haz-
6. Ludbrook, J., Comparing methods of mea-
ardous approach. We hope that the content
surements, Clin. Exp. Pharmacol. Physiol.,
of this chapter is useful in illustrating both
24(2), 193 203, 1997.
strengths and some pitfalls of computer-
based curve fitting, and some ways to opti- 7. Bates, D.M. and Watts, D.G., Nonlinear
Regression Analysis and Its Applications,
mize the quality and utility of the param-
Wiley and Sons, New York, 1988.
eters so obtained.
8. Johnson, M.L. and Faunt, L.M., Param-
eter estimation by least-squares methods,
Methods Enzymol., 210, 1 37, 1992.
IX. SOFTWARE
9. Press, W.H., Teukolsky, S.A., Vetterling,
W.T., and Flannery, B.P., Numerical Reci-
Table 2 contains a limited sampling of
pes in C. The Art of Scientific Computing,
commercially available curve fitting pro-
Cambridge University Press, Cambridge,
grams. Some of them (e.g., EnzFitter and
MA, 1992.
KELL) are more specialized in their appli-
10. Gunst, R.F. and Mason, R.L., Regression
cations than others, but all are commonly
Analysis and Its Applications: A Data Ori-
applied to curve fitting of biological models
ented Approach, Marcel Dekker, New York,
to data. Also shown in the table are the
1980.
associated regression algorithms utilized by
11. Cornish-Bowden, A., Analysis of Enzyme
each program.
Kinetic Data, Oxford University Press,
New York, 1995.
REFERENCES
12. Johnson, M.L., Analysis of ligand-bind-
ing data with experimental uncertainties
in independent variables, Methods
1. Wells, J.W., Analysis and interpreta-
Enzymol., 210, 106 17, 1992.
tion of binding at equilibrium, in Re-
13. Motulsky, H.J. and Ransnas, L.A., Fitting
ceptor-Ligand Interactions: A Practi-
curves to data using nonlinear regression:
cal Approach, E.C. Hulme, Ed., Oxford
a practical and nonmathematical review,
University Press, Oxford, 1992, 289
FASEB J., 1, 365 74, 1987.
395.
14. Jurs, P.C., Curve fitting, in Computer Soft-
2. Hill, A.V., The possible effects of
ware Applications in Chemistry, Wiley and
the aggregation of the molecules
Sons Inc., New York, 1996, 25 51.
of haemoglobin on its dissociation
curves, J. Physiol., 40, iv vii,
15. Black, J.W. and Leff, P., Operational
1910.
models of pharmacological agonism, Proc.
Roy. Soc. (Lond.) B., 220, 141 62, 1983.
3. Hill, A.V., The combinations of hae-
moglobin with oxygen and with carbon
16. Leff, P., Prentice, D.J., Giles, H., Martin,
monoxide. I, Biochem. J., 7, 471 80,
G.R., and Wood, J., Estimation of agonist
1913.
affinity and efficacy by direct, operational
390
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10
model-fitting, J. Pharm. Methods, 23, 225 22. Rovati, G.E., Rodbard, D., and Munson,
37, 1990. P.J., DESIGN: computerized optimization
of experimental design for estimating Kd
17. Christopoulos, A., Assessing the distribu-
and Bmax in ligand binding experiments. II.
tion of parameters in models of ligand-
Simultaneous analysis of homologous and
receptor interaction: to log or not to log,
heterologous competition curves and
Trends Pharmacol. Sci., 19, 351 7, 1998.
analysis of  multiligand dose-response
surfaces, Anal. Biochem., 184(1), 172 83,
18. Parker, R.B. and Waud, D.R., Pharmaco-
1990.
logical estimation of drug-receptor disso-
ciation constants. Statistical evaluation. I.
23. Johnson, M.L., Use of least-squares tech-
Agonists, J. Pharmacol. Exp. Ther., 177,
niques in biochemistry, Methods Enzymol.,
1 12, 1971.
240, 1 22, 1994.
19. Lew, M.J. and Angus, J.A., Analysis of
24. Straume, M., Veldhuis, J.D., and Johnson,
competitive agonist-antagonist interactions
M.L., Model-independent quantification of
by nonlinear regression, Trends
measurement error: empirical estimation
Pharmacol. Sci., 16(10), 328 37, 1995.
of discrete variance function profiles based
on standard curves, Methods Enzymol.,
20. Straume, M. and Johnson, M.L., Analysis
240, 121 50, 1994.
of residuals: criteria for determining good-
ness-of-fit, Methods Enzymol., 210, 8 105,
25. Efron, B. and Tibshirani, R.J., An
1992.
Introduction to the Bootstrap., Vol.
21. Rovati, G.E., Rodbard, D., and Munson,
57. Chapman and Hall, New York,
P.J., DESIGN: computerized optimization
1993.
of experimental design for estimating Kd
26. Ratkowsky, D., Nonlinear Regression
and Bmax in ligand binding experiments. I.
Modelling: A Unified and Practical Ap-
Homologous and heterologous binding to
proach, Marcel Dekker, New York,
one or two classes of sites, Anal. Biochem.,
1983.
174(2), 636 49, 1988.
391
For personal use only.
Critical Reviews in Biochemistry and Molecular Biology Downloaded from informahealthcare.com by 83.26.195.157 on 11/28/10


Wyszukiwarka