Sensory difference tests Overdispersion and warm up

background image

Sensory difference tests: Overdispersion and warm-up

Ofelia Angulo

a

, Hye-Seong Lee

b

, Michael OÕMahony

b,*

a

Instituto Tecnologico de Veracruz, M.A. de Quevedo 2779, Veracruz, Ver, Mexico

b

Department of Food Science and Technology, University of California, Davis, CA, United States

Received 14 February 2005; received in revised form 5 July 2005; accepted 28 September 2005

Available online 8 November 2005

Abstract

For sensory difference tests, one way, but not the only way, of dealing with the problem of overdispersion is to use a beta-binomial

analysis. Commonly, binomial statistical analyses are used for these methods and they assume that the sensitivity of the judges is uni-
form. However, judge sensitivity varies and this adds a problematical extra variance to the distribution. This is termed overdispersion
and renders simple binomial analysis prone to Type I error. The distribution of sensitivity of the judges is described by a beta-distribu-
tion. The analysis, combining beta and binomial distributions, gives an index, gamma. This ranges from zero, for no overdispersion, to
unity, for total overdispersion. A compact beta-distribution clustered around the mean of the binomial distribution, would add little
extra variance and elicit minimum distortion of the binomial distribution, yielding a zero or near zero gamma value. A more scattered
or even bimodal beta-distribution would have a substantial effect and yield a significant gamma value. One question that has been posed
is whether some test methods are more prone to overdispersion than others. Yet, a consideration of the reasons for overdispersion would
suggest that significant gamma values were more a result of obtaining a heterogenous sample of sensitive and insensitive judges by
chance. To confirm this, Ôless sensitiveÕ and Ômore sensitiveÕ samples of judges performed 2-AFC and 3-AFC tests with resulting zero
gamma values, indicating no overdispersion. However, when the less and more sensitive groups were combined, significant gamma values
were obtained, indicating the presence of overdispersion. However, in a further experiment using 2-AFC tests, when the Ôless sensitiveÕ
group had its sensitivity increased by a Ôwarm-upÕ procedure, combination with the Ômore sensitiveÕ group did not result in
overdispersion.
2005 Elsevier Ltd. All rights reserved.

Keywords: Overdispersion; Beta-binomial; Gamma; Warm-up; Difference tests; 2-AFC; 3-AFC; Triangle; Duo-trio

1. Introduction

Statistical tests based on the binomial distribution are

generally used to determine whether the proportion of dif-
ference tests performed correctly is greater than chance,
thus indicating that the difference was ÔsignificantÕ. Yet,
such binomial statistics were designed to analyze the
results of tossing coins or dice when the probability of
getting a target result (ÔheadsÕ or ÔsixÕ) is constant for each
item (1/2 or 1/6). With a set of difference tests performed
by a single judge during a single experimental session, it
could be argued that the probability of getting a target

result (test correct) is also constant over repeated tests,
should there be no fatigue or adaptation effects over rep-
licate testings. Yet, if data from several judges were to be
combined, this situation would no longer hold; judges
have different sensitivities and thus their probabilities of
getting the target result (test correct) would vary. The
assumptions for the binomial test would be violated. This
violation can result in Type I errors, the declaration of a
Ô

significantÕ difference when the results were probably due

to chance or guessing.

The variation in the sensitivity of the judges provides an

extra source of variance in the computation. There is more
variance than would be expected from a mere binomial
analysis. The problem of this extra variance has a name:
the problem of overdispersion.

0950-3293/$ - see front matter

2005 Elsevier Ltd. All rights reserved.

doi:10.1016/j.foodqual.2005.09.015

*

Corresponding author. Tel.: +1 530 756 5493; fax: +1 530 756 7320.
E-mail address:

maomahony@ucdavis.edu

(M. OÕMahony).

www.elsevier.com/locate/foodqual

Food Quality and Preference 18 (2007) 190–195

background image

There are various solutions to the problem (

Brockhoff,

2003; Brockhoff & Schlich, 1998; Kunert, 2001; Kunert &
Meyners, 1999

). However, the purpose of this paper is

experimental; it is not to discuss the relative merits of the
various statistical approaches. The approach that will be
discussed here uses beta distributions to describe the distri-
butions of judge sensitivities encountered during difference
testing. The beta distributions are combined with the regu-
lar binomial distributions to give what are called beta-bino-
mial distributions. These are the basis for the beta-binomial
statistical analysis for difference tests. The beta-binomial
with the extra variance brought in by the beta distribution
will have a greater variance than the binomial distribution.
This greater variance means it will be more difficult to reject
the null hypothesis and declare a ÔsignificantÕ difference. In
this way, it can be seen that there is a risk of Type I errors
if a binomial analysis is used rather than a beta-binomial
analysis.

Harries and Smith (1982)

studied the beta-binomial

analysis for triangle tests with demonstrations of the effects
of some beta distributions. More recently, the beta-bino-
mial test has been developed by Ennis, Bi and their
coworkers (

Bi & Ennis, 1998, 1999a; Bi, Templeton-Janik,

Ennis, & Ennis, 2000; Ennis & Bi, 1998

). They describe

overdispersion by an index called gamma (c). A gamma
value of zero indicates no overdispersion while a gamma
value of unity indicates maximum overdispersion. Gener-
ally, values are intermediate between zero and unity.

Bi

and Ennis (1999b)

published tables indicating how the pro-

portions of tests required to be correct to declare a Ôsignif-
icantÕ difference varied with gamma.

For experiments in which the data analysis is in terms of

d

0

, the beta-binomial approach fits conveniently into a

Thurstonian framework (

Bi & Ennis, 1998

). The required

increase in variance for the beta-binomial distribution is
obtained by using a multiplier, which is a function of
gamma. The same multiplier can be used whether the
beta-binomial represents the variation in the proportion
of tests correct or the variation of d

0

.

A further aspect of being able to correct for overdisper-

sion is the ability to increase the sample size by combining
judges and replicate testings to boost the power of the anal-
ysis. If a coin is tossed three times and three separate coins
are tossed once, it is possible to combine coins and repli-
cate tosses and have a sample size of six tosses. The same
is possible for dice but not for judges performing difference
tests. A judge performing three triangle tests is not equiva-
lent to three separate judges each performing a single trian-
gle test. Judges and replicate testings cannot be combined
using a binomial statistical analysis unless the gamma value
was zero. To do so, would open the possibility of Type I
errors. Yet, with a beta-binomial analysis, judges and rep-
licate testings can be combined. Thus, if 10 judges each per-
form 10 tests, the data can be treated as a sample size of
10

· 10 = 100.

In practice, sometimes overdispersion (a c value signifi-

cantly greater than zero) is encountered; sometimes it is

not.

Rousseau and OÕMahony (2001)

working with orange

drinks found no overdispersion with triangle and same–dif-
ferent tests.

Braun, Rogeaux, Schneid, OÕMahony, and

Rousseau (2004)

using 2-AFC tests, reported overdisper-

sion with a gamma value of 0.036, which was significantly
greater than zero. Yet, it could be argued that the value
was unimportantly small and only significant because of
the large sample size.

Rousseau and OÕMahony (2000)

using triangle, dual pair, and same–different tests, with
orange flavored beverages, combined judges and replicate
testings but did not quote a gamma value in their analysis.

Ligget and Delwiche (2005)

required judges to perform

paired preference and 2-AFC tests for fruit flavored bever-
ages, chips and cookies. Significant gamma values were
obtained for two of the five paired preference studies and
two of the five 2-AFC studies. Yet, these did not occur
for the same foods; there was no consistency over products.
Increasing the number of replicate testings also had little
effect. Comparing judgesÕ performance on 2-AFC, 3-
AFC, triangle and duo-trio tests with cherry flavored
drinks, a significant gamma value was obtained only with
the duo-trio. Using cherry flavored drinks, once more,
and 2-AFC tests, the performance of judges over 10 days
was measured. Significant overdispersion occurred on three
of the days.

Considering these studies, there does not appear to be

any particular pattern that supports that one test may be
more prone to overdispersion than another. Yet, the evi-
dence, as yet, is still sparse. The first goal of this study
was to collect some further evidence.

To gain further insight, it is worth reconsidering how

values of gamma should be interpreted. Given that judges
are not clones, then, it may be asked why cases occur when
gamma values are not significant and there is no significant
overdispersion. It would only be possible if the addition of
a beta distribution to a binomial distribution had a mini-
mal effect on the shape and variance of the latter. As

Har-

ries and Smith (1982)

pointed out, a beta distribution that

was fairly compact and clustered around the mean of a
binomial distribution would have such a minimal effect.
It would mean that the sensitivities of the judges in the dis-
crimination tests were close to the mean. As they further
remarked, a beta distribution that was more scattered or
even bimodal would have a substantial effect on the bino-
mial distribution and thus produce a significant gamma
value. Bimodal distributions can occur in preference testing
when the judges are split on their preferences and also with
difference tests, if the sample contains a group of more sen-
sitive and a group of less sensitive judges. The Ômore sensi-
tiveÕ and Ôless sensitiveÕ judges under consideration may
come from a bimodal distribution, indicating differences
in the judgesÕ sensory systems per se. Alternatively, they
might be drawn from a single distribution, and the terms
Ô

more sensitiveÕ and Ôless sensitiveÕ merely applied to varia-

tion in judges performance in that particular test.

To support these considerations, firstly, it may be

hypothesized that if a sample of judges performing a

O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195

191

background image

difference test were to be made up of such Ômore sensitiveÕ
and Ôless sensitiveÕ groups, then gamma values for each of
the two separate groups would be low. Yet, when the
two groups were combined, the beta distribution would
cause significant distortion of the binomial distribution
and produce a higher gamma value. The second goal of this
study was to test this hypothesis.

A second hypothesis concerns what

Pfaffmann (1954)

called Ôwarm-upÕ and which increases the measured sensi-
tivity of judges. The term was borrowed from early studies
on paired associate learning (for example,

Heron, 1928;

Thune, 1950

) and skilled behavior acquisition (for exam-

ple,

Ammons, 1947

). For gustation, the phenomenon is

more akin to the latter, taste discrimination having many
of the attributes of skilled behavior. In current practice
(

Dacremont, Sauvageot, & Duyen, 2000; OÕMahony, Thi-

eme, & Goldstein, 1988; Thieme & OÕMahony, 1990

)

warm-up involves judges alternately testing the two stimuli
to be discriminated, knowing which is which, until they
have identified the difference in sensations elicited by the
two confusable stimuli. Often, with confusable taste stim-
uli, a judge will not, at first, perceive a difference. Yet, after
repeated testing, the differences between the stimuli begin
to appear, sometimes suddenly. The increase in sensitivity
induced by warm-up can be considered as a focusing of
attention: focusing on or amplifying the differences in input
from the two stimuli, while attenuating the similarities.

Consider the situation where there was a sensitive and

an insensitive group combined to give a high gamma value.
Then, if the less sensitive group performed the warm-up
procedure, it could be hypothesized that their increase in
sensitivity would render them as sensitive as the more sen-
sitive judges. In this way, the two groups of judges would
become more similar in sensitivity; gamma would decrease.
The third goal of this study was to test this hypothesis.

For all three goals, a model system was used to confine

the variance to judge variation rather than product
variation.

2. Experiment I

The goal of this experiment was to collect further data

comparing the proneness of various difference test methods
to overdispersion. The test methods studied were: the trian-
gle, duo-trio, 2-AFC and 3-AFC.

2.1. Materials and methods

2.1.1. Judges

One hundred and six judges, students, staff and friends

at the University of California, Davis, participated in the
experiment. None of the judges had consumed food and/
or beverages within an hour before starting the experiment.
Of the 106 judges tested, 13 had participated in taste sen-
sory experiments before. The groups tested were as follows:
2-AFC method (12M, 9F, age range 22–58 yrs), 3-AFC
(7M, 8F, 22–58 yrs), triangle (5M, 9F, 21–57 yrs), duo-trio

(6M, 10F, 21–58 yrs). A further group of 40 judges (21M,
19F, 21–75 yrs) used all four methods.

2.1.2. Stimuli

The stimuli to be discriminated were 3 mM vs. 5 mM

NaCl solutions for the group of 40 judges who performed
all four tests. For the judges who performed just one test,
3 mM vs. 5 mM NaCl solutions were used for the 2-AFC
and 3-AFC tests and 1 mM vs. 5 mM NaCl solutions for
the less powerful (

Ennis, 1993

) triangle and duo-trio tests.

The NaCl was reagent grade (Mallinckrot, Inc. Paris, KY)
and the solvent was Milli-Q purified water (Millipore Corp,
Bedford, MA). The Milli-Q purified water had a specific
conductivity < 10

6

mho/cm and a surface tension P

71 dynes/cm.

The purified water was also used for interstimulus rins-

ing. Stimuli were dispensed in 10 ml aliquots using Oxford
Adjustable Dispensers (Lancer, St. Louis, MO.) in plastic
cups (1oz. portion cups, Solo Cup Co., Urbana, IL). All
stimuli were served at constant room temperature (21–
24

C) on white plastic cutting trays.

2.1.3. Procedure

A related-samples and an independent-samples design

was used. For the related-samples design, judges were
required to perform all four test methods in a single ses-
sion. After taking demographic details, establishment of
rapport and experimental instructions, judges took seven
mouthrinses. They then proceeded to perform six 2-AFC,
six 3-AFC, six triangle and six duo-trio tests. No interstim-
ulus rinses were taken within tests; judges were able to rinse
ad lib between tests. The order of the four test methods and
the order of stimulus presentation within a method was
counterbalanced over judges. Judges gave their responses
verbally. Session lengths ranged between 12–30 min.

For the independent-samples design, separate samples

of judges (noted above) used only one test method. The
procedure was as above, except that each judge performed
12 tests in a session. Some judges performed in more than
one group. Session lengths ranged 5–15 min.

2.2. Results

For each test method, for both the related-samples and

independent-samples designs, gamma values were derived
according to the beta-binomial computation (

Bi & Ennis,

1998, 1999a, 1999b; Bi et al., 2000

). The computations were

performed using IFPrograms software based on maximum
likelihood (Institute for Perception, Richmond, VA).

Table

1

displays the gamma values along with probabilities of

getting values this large on the null hypothesis (c = 0).
From the table, it can be seen that a significant gamma
value was obtained for the 2-AFC method (p = 0.01) with
the related-samples design. For the independent-samples
design, a near significant gamma value (p = 0.08) was
obtained for the triangle method. There was no consistent
trend for one particular method to be more prone to over-

192

O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195

background image

dispersion than another. This accorded with the previous
findings discussed above. Values of d

0

(

Ennis, 1993

) are

not quoted in the table, but for the related-samples design
they were not significantly different, ranging from 0.8 to
1.2. For the independent-samples design, d

0

values ranged

from 1.1 to 2.5 for separate sets of judges. Computations
of d

0

were performed using the IFPrograms software.

3. Experiment II

This experiment was designed to test the first hypothesis

that two groups of judges of different sensitivity, each dis-
playing no overdispersion would, when combined, give sig-
nificant gamma values. For this, judges performed 2-AFC
and 3-AFC tests. On the basis of their d

0

values, they were

divided into two separate groups: Ômore sensitiveÕ and Ôless
sensitiveÕ. Gamma values were computed for the two
groups separately and also for when the two were
combined.

3.1. Materials and methods

3.1.1. Judges

Twenty judges (3M, 17F, age range 16–63 yrs) perform-

med 2-AFC tests and 16 judges (6M, 10F, 20–62 yrs) per-
formed 3-AFC tests. These judges had been screened as
being sufficiently sensitive to be included in the experiment.
As before, the samples of judges were drawn from students,
staff and friends at the University of California, Davis. All
judges had fasted, except for water, for at least 1 h prior to
the experiment. All judges were naı¨ve to sensory testing
except for four judges in the 2-AFC group and eight in
the 3-AFC group.

3.1.2. Stimuli

The stimuli to be discriminated were 3 mM vs. 5 mM

NaCl solutions as described for Experiment I.

3.1.3. Procedure

The procedure, including the rinsing protocols, was as

for Experiment I, with the following modifications. In a
single session, judges performed twenty 2-AFC (or 3-
AFC) tests. Session lengths ranged 10–20 min.

3.2. Results

For both, the 2-AFC and 3-AFC tests, judges were split

into Ômore sensitiveÕ and Ôless sensitiveÕ groups. For this, a
simple rule was adopted, despite different chance probabil-
ities. Judges who performed 16–20 tests correctly were
deemed Ômore sensitiveÕ, while those performing 10–15 tests
correctly were categorized as Ôless sensitiveÕ. Judges with
inferior or chance performance levels were considered not
to be sufficiently sensitive to the differences between the
stimuli to be included in the Ômore sensitiveÕ or Ôless sensi-
tiveÕ groups. They were eliminated during the screening
and their data are not recorded here.

For the 2-AFC method, gamma and d

0

values, with

associated null probabilities and variances, were computed
as in Experiment I, using IFPrograms software. Separate
values were computed for the Ômore sensitiveÕ and Ôless sen-
sitiveÕ groups as well as for the two groups combined. The
same computation was performed for the 3-AFC-method.
The data are shown in

Table 2

.

From the table, it can be seen that the Ômore sensitiveÕ

groups have higher d

0

values than the Ôless sensitiveÕ groups,

while the combined values are intermediate, as expected. It
can also be seen that within each ÔsensitivityÕ group, gamma
values are zero, indicating no overdispersion. Yet, the
gamma values of the combined groups are finite and signif-
icantly greater than zero (p 6 0.03), indicating significant
overdispersion. Thus, two groups of different sensitivity,
each having no overdispersion, when combined can pro-
duce finite gamma values. This confirms the first
hypothesis.

It might be argued that the significance results obtained

for the combined samples owe their significance to the lar-
ger sample sizes. This might indeed be true. However, the
fact that the Ômore sensitiveÕ and Ôless sensitiveÕ samples
had zero gamma values and the combined samples had
finite values, indicates how overdispersion increased when
the two samples were combined.

To test consistency, the experiment was repeated on the

same judges and gave the same results. For the 2-AFC,
gamma values were 0.03 (more sensitive), 0.00 (less sensi-
tive) and 0.09 (combined). For the 3-AFC, the correspond-
ing values were 0.00, 0.00 and 0.06.

Table 1
Gamma values for the 2-AFC, 3-AFC, duo-trio and triangle methods for
the related-samples and independent-samples designs for Experiment I

Experimental design

2-AFC

3-AFC

duo-trio

Triangle

Related-samples design

a

N = 40

c

0.12

0.04

0.00

0.00

p

0.01

0.35

1.00

1.00

Independent-samples design

b

N = 21

N = 15

N = 16

N = 14

c

0.03

0.01

0.05

0.07

p

0.34

0.75

0.17

0.08

a

Each judge performed 6 tests for each method.

b

Each judge performed 12 tests for each method.

Table 2
Gamma values for the 2-AFC and the 3-AFC methods with more sensitive
and less sensitive judges for Experiment II

2-AFC

3-AFC

More
sensitive

Less
sensitive

Combined

More
sensitive

Less
sensitive

Combined

N = 6

N = 14

N = 20

N = 8

N = 8

N = 16

c

0.00

0.00

0.04

0.00

0.00

0.06

p

1.00

1.00

0.03

1.00

1.00

0.01

d

0

1.75

0.52

0.81

2.10

0.99

1.46

r

2

0.05

0.01

0.01

0.03

0.02

0.01

For both methods, each judge performed 20 tests.

O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195

193

background image

4. Experiment III

This experiment was designed to confirm the second

hypothesis that given a Ômore sensitiveÕ and a Ôless sensitiveÕ
group of judges, warm-up given to the Ôless sensitiveÕ group,
would increase its sensitivity to such an extent that the
combined group would not display overdispersion.

4.1. Materials and methods

4.1.1. Judges

Nineteen judges (3M, 16F, age range 16–63 yrs) stu-

dents, staff and friends at the University of California,
Davis participated. All had fasted for at least 1 h before
the experiment took place. All, except seven, were naı¨ve
to sensory testing.

4.1.2. Stimuli

The stimuli to be discriminated were 3 mM vs. 5 mM

NaCl solutions, as in Experiments I and II.

4.1.3. Procedure

The procedure, including rinsing protocols, was the

same as for Experiment II, judges performing twenty 2-
AFC tests in a single session. Depending on performance
in the session, judges were divided, as before, into a Ômore
sensitiveÕ and a Ôless sensitiveÕ group, using the same perfor-
mance criteria as in Experiment II. Those judges in the Ôless
sensitiveÕ group repeated the experiment with the addition
of a prior warm-up procedure. For this group, after the ini-
tial seven mouthrinses, judges went through the warm-up
before performing the twenty 2-AFC tests. For the
warm-up, judges were presented with a set of 3 mM and
5 mM NaCl solutions. Each set was labeled so that the
judge was aware of the stimuli. They tasted the 3 mM
and 5 mM stimuli alternately, until they felt that they could
distinguish the signals indicating the difference between the
two stimuli. Initially, six of each stimulus was presented
and this was found sufficient for warm-up. When judges
reported that they could distinguish between the two stim-
uli, testing was started immediately. Some judges reported
that they did not think that they had Ôwarmed-upÕ, but they
proceeded with the experiment, anyway. No interstimulus
rinses were taken during warm-up. Session lengths ranged
10–20 min.

4.2. Results

Gamma and d

0

values with associated null probabilities

and variances, were computed as in Experiment II for the
Ô

less sensitiveÕ, Ômore sensitiveÕ and combined groups, and

in the formerÕs case, before and after warm-up. These data
are displayed in

Table 3

.

From the table, it can be seen that the Ômore sensitiveÕ

group had higher d

0

values than the Ôless sensitiveÕ group,

with the combined groups having intermediate values, as
expected. It can be seen, that the Ômore sensitiveÕ and Ôless

sensitiveÕ groups, whether warmed-up or not, were suffi-
ciently homogeneous to have zero gamma values. How-
ever, when the two groups were combined, without any
warm-up, there was sufficient overdispersion to give a finite
and significant gamma value. Yet, if the Ôless sensitiveÕ
judges were given a prior warm-up, their measured sensitiv-
ity increased enough (d

0

= 0.52 vs. 0.98) for the judges to be

sufficiently homogeneous with the Ômore sensitiveÕ group.
This resulted in the combined group having a zero gamma
value, indicating no overdispersion and confirming the sec-
ond hypothesis.

5. Discussion

In the first experiment, there was no particular trend for

one difference test to be more prone to overdispersion than
another. This corresponded to previous research discussed
above. It would appear that overdispersion was less the
result of the test method than the result of a particular sam-
pling of judges. From a consideration of the suggestion of

Harries and Smith (1982)

, it would appear that overdisper-

sion is the result of a particular distribution of judge sensi-
tivities, the beta distribution. A compact beta distribution
would have little effect on the binomial distribution, so that
the beta-binomial would resemble the binomial, resulting
in little or no overdispersion and non-significant gamma
values. This was confirmed by Experiments II and III,
where Ôless sensitiveÕ groups of judges were combined with
Ô

more sensitiveÕ groups, both having zero gamma, to form a

combined group. The latter then, demonstrated overdisper-
sion, with gamma values significantly greater than zero.
The effect was prevented by providing the Ôless sensitiveÕ
group with a warm-up procedure, to match its sensitivity
to that of the Ômore sensitiveÕ group.

Overdispersion is thus avoided if the sample of judges

happens to have sensitivities distributed close to the mean
of the binomial distribution. One may speculate regarding
when such a group of judges might be sampled. It might be
expected with regular long-term consumers who had become
uniformly highly sensitive to product differences over the
years. With casual consumers, variation in frequency of
use might have more of an effect on sensitivity and therefore
generate overdispersion. This is a testable hypothesis.

Table 3
Gamma values for the 2-AFC method with more sensitive and less
sensitive judges before and after warm-up for Experiment III

2-AFC

Before warm-up

After warm-up

More
sensitive

Less
sensitive

Combined

Less
sensitive

Combined

N = 10

N = 9

N = 19

N = 9

N = 19

c

0.00

0.00

0.07

0.00

0.01

p

1.00

1.00

0.00

1.00

0.50

d

0

1.70

0.52

1.05

0.98

1.31

r

2

0.03

0.02

0.01

0.02

0.01

Each judge performed 20 tests in each condition.

194

O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195

background image

The beta-binomial allows combination of consumers and

replicate tests, to increase the sample size. This might be
acceptable in psychophysics, where judges can be fairly
homogeneous in some aspects of performance. Combining
judges and replicate tests to obtain a large sample size might
be necessary for such tasks as fitting an average ROC curve
to the performance of a small group of judges (

Hautus &

Irwin, 1995

). It might also clarify the results of some differ-

ence tests. However, the technique should be used with cau-
tion when sampling consumers. If 10 consumers each
performed five difference tests, even though the sample size
can be treated as 50, in reality only 10 consumers will have
been tested. This is hardly a representative sample.

It should also be noted that gamma values can only be

calculated to reveal possible overdispersion effects, when
judges perform replicate tests. If each judge were only to
perform a single test, it would not be possible to determine
whether there was overdispersion or not. Thus, it is recom-
mended that judges should repeat the test at least once, to
enable the presence of overdispersion to be detected. This is
important to avoid the possibility of Type I errors. Over-
dispersion can be likened to interaction in ANOVA; its
presence can only be detected with replication.

The beta-binomial deals with sensitivity variation due to

judge heterogeneity. It assumes that sensitivity is constant
for a given judge over replicate tests. This may not be true
for all products testing situations. If sensitivity varied over
replicate tests, the statistical analysis would require a more
complex beta–beta-binomial analysis. Thus, even though
replication is required to detect the presence of over-
dispersion, the number of replicates chosen should be
approached with caution. Preliminary experimentation is
necessary for determining the appropriate number of tests
per session. Too many replicates in a given session could
induce Ôtaste fatigueÕ. The resulting change in sensitivity
of the judge would entail more complicated statistics.

Experiment I included the triangle and duo-trio meth-

ods, Experiments II and III confined themselves to 2-
AFC and 3-AFC methods. This was because the larger
number of replicate testings and possible warm-up effects
might have altered the cognitive strategies of the triangle
and duo-trio from a Ôcomparison of distancesÕ strategy to
a ÔskimmingÕ strategy. While on the subject of cognitive
strategies, the present research does not necessarily refute
the idea that cognitive strategies can affect overdispersion.
However, the lack of consistency among tests, would sug-
gest that any cognitive strategy effect must be small.

References

Ammons, R. B. (1947). Acquisition of motor skill: I. Quantitative analysis

and theoretical formulation. Psychological Review, 54, 263–281.

Bi, J., & Ennis, D. M. (1998). A Thurstonian variant of the beta-binomial

model for replicated difference tests. Journal of Sensory Studies, 13,
461–466.

Bi, J., & Ennis, D. M. (1999a). The power of sensory discrimination

methods used in replicated difference and preference tests. Journal of
Sensory Studies, 14, 289–302.

Bi, J., & Ennis, D. M. (1999b). Beta-binomial tables for replicated

difference and preference tests. Journal of Sensory Studies, 14, 347–368.

Bi, J., Templeton-Janik, L., Ennis, J. M., & Ennis, D. M. (2000).

Replicated difference and preference tests: how to account for inter-
trial variation. Food Quality and Preference, 11, 269–273.

Braun, V., Rogeaux, M., Schneid, N., OÕMahony, M., & Rousseau, B.

(2004). Corroborating the 2-AFC and 2-AC Thurstonian models using
both a model system and sparkling water. Food Quality and Preference,
15, 501–507.

Brockhoff, P. B. (2003). The statistical power of replications in difference

tests. Food Quality and Preference, 14, 405–417.

Brockhoff, P. B., & Schlich, P. (1998). Handling replications in discrim-

ination tests. Food Quality and Preference, 9, 303–312.

Dacremont, C., Sauvageot, F., & Duyen, T. H. (2000). Effect of assessors

expertise on efficiency of warm-up for triangle tests. Journal Sensory
Studies, 15, 151–152.

Ennis, D. M. (1993). The power of sensory discrimination methods.

Journal of Sensory Studies, 8, 353–370.

Ennis, D. M., & Bi, J. (1998). The beta-binomial model: accounting for

inter-trial variation in replicated difference and preference tests.
Journal of Sensory Studies, 13, 389–412.

Harries, J. M., & Smith, G. L. (1982). The two-factor triangle test. Journal

of Food Technology, 17, 153–162.

Hautus, M. J., & Irwin, R. J. (1995). Two models for estimating the

discriminability of foods and beverages. Journal of Sensory Studies, 10,
203–215.

Heron, W. T. (1928). The warming-up effect in learning nonsense syllables.

Journal of Genetic Psychology, 35, 219–228.

Kunert, J. (2001). On repeated difference testing. Food Quality and

Preference, 12, 385–391.

Kunert, J., & Meyners, M. (1999). On the triangle test with replications.

Food Quality and Preference, 10, 477–482.

Ligget, R. E., & Delwiche, J. (2005). The beta-binomial model: Variability

in overdispersion across methods and over time. Journal of Sensory
Studies, 20, 48–61.

OÕMahony, M., Thieme, U., & Goldstein, L. R. (1988). The warm-up

effect as a measure of increasing the discriminability of sensory
difference tests. Journal of Food Science, 53, 1848–1850.

Pfaffmann, C. (1954). Variables affecting difference tests. In D. R. Peryam,

F. J. Pilgrim, & M. S. Peterson (Eds.), Food acceptance testing
methodology, a symposium (pp. 4–20). Washington, DC: National
Academy of Science-National Research Council.

Rousseau, B., & OÕMahony, M. (2000). Investigation of the effect of

within-trail retasting and comparison of the dual-pair, same–different
and triangle paradigms. Food Quality and Preference, 11, 457–464.

Rousseau, B., & OÕMahony, M. (2001). Investigation of the dual pair

method as a possible alternative to the triangle and same–different
tests. Journal of Sensory Studies, 16, 161–178.

Thieme, U., & OÕMahony, M. (1990). Modifications to sensory difference

test protocols: the warmed up paired comparison, the single standard
duo-trio and the A-not A test modified for response bias. Journal of
Sensory Studies, 5, 159–176.

Thune, L. E. (1950). The effect of different types of preliminary activities

on subsequent learning of paired associate material. Journal of
Experimental Psychology, 40, 423–438.

O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195

195


Document Outline


Wyszukiwarka

Podobne podstrony:
Chapter 9 Warm up and Flexibility
01 02 warm up and speaking
Athletics omnibus warm up and streching
PRACTICAL SPEAKING EXERCISES with using different grammar tenses and constructions, part Ix
cold and warm colours
Warm up początek roku
70 Warm up 2
Genomic differences between C glabrata and S cerevisiea
20 Seasonal differentation of maximum and minimum air temperature in Cracow and Prague in the period
Warm up
140 Shooting Warm up Practice
Pec 12 frequent questions Sensor Cleaning for DSLR and Mirrorless Cameras
warm up game
128 Warm up – Running with the ball in a 10 x 10m grid
11 28 tell me about WARM UP
warm up 5
Warm Up
131 Warm up

więcej podobnych podstron