13 May 2002
Physics Letters A 297 (2002) 261–266
A unified prediction of computer virus spread
in connected networks
Lora Billings
a,
∗
, William M. Spears
b
, Ira B. Schwartz
c
a
Department of Mathematical Sciences, Montclair State University, Upper Montclair, NJ 07043, USA
b
Department of Computer Science, University of Wyoming, Laramie, WY 82071, USA
c
Naval Research Laboratory, Nonlinear Dynamics System Section, Code 6792, Plasma Physics Division, Washington, DC 20375, USA
Received 8 November 2001; received in revised form 5 February 2002; accepted 5 February 2002
Communicated by C.R. Doering
Abstract
We derive two models of viral epidemiology on connected networks and compare results to simulations. The differential
equation model easily predicts the expected long term behavior by defining a boundary between survival and extinction regions.
The discrete Markov model captures the short term behavior dependent on initial conditions, providing extinction probabilities
and the fluctuations around the expected behavior. These analysis techniques provide new insight on the persistence of computer
viruses and what strategies should be devised for their control.
2002 Elsevier Science B.V. All rights reserved.
PACS: 07.05.Tp; 02.30.H; 02.50.Ga
Keywords: Computer virus; Markov models; Differential equation models
Concerns about our vulnerability to computer virus-
es have prompted a serious effort to model their
spread, and to develop a detection and control mecha-
nism that prevents widespread infection of computers
connected to the internet. Because of the widespread
connectivity of the internet, there has been much at-
tention paid to the organization and transmission of in-
formation on finite networks. One such example is the
small-world network [1], where a regular network is
combined with a few random interconnections. Sim-
ilar models have generated statistical cluster analysis
*
Corresponding author.
E-mail address: billingsl@mail.montclair.edu (L. Billings).
URL address:
http://www.csam.montclair.edu/billings
based on percolation [2], scaling laws [3], and control
of information [4]. Recent work on internet connectiv-
ity has shown that with the right scaling law, the in-
ternet is robust when computer attacks remove certain
nodes [5]. Moreover, percolation theories have been
used to model the propagation of an epidemic proba-
bilistically [6].
Recently, traditional biological approaches have
been combined with topological interconnects, such
as small-world networks [1,7], percolation theory, and
graph theory, to predict the conditions for which an
epidemic will occur [8]. Examples are discrete algo-
rithms based on Markov models and continuous or-
dinary differential equation models derived from epi-
demic compartmental models based on large popula-
tions [9]. In this Letter, we derive a Markov and an
0375-9601/02/$ – see front matter
2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 5 - 9 6 0 1 ( 0 2 ) 0 0 1 5 2 - 4
262
L. Billings et al. / Physics Letters A 297 (2002) 261–266
ordinary differential equation (ODE) model of viral
epidemiology on connected networks and compare re-
sults to simulations. Using both methods, we derive a
complete picture of viral propagation in a connected
network.
We continue the foundation work presented by
Kephart and White [10] by improving on some of
their assumptions and approximations. Our Markov
model is a variation of the Reed–Frost model [11]
and our differential equation is a variation of the
SIS model [12]. One major difference is that we
do not use the assumption that only one infection
can occur at a time. It is more realistic to include
the probability that multiple computers can become
infected (or cured) at a given instant. We have also
taken a new approach to model the rate at which a node
attempts to infect its neighbors. We assume that the
rate depends on the number of nodes infected, not just
the connectivity of a node as in [13,14]. An infection
occurs in two stages. Not only is there a probability
that a person can receive an infected email, but there
is a probability that a person might choose not to open
it. With these modifications to the model, we have
improved the agreement between how the differential
equation and the Markov model predict the behavior
of a simulation, and derive the threshold under which
a virus dies out. We exploit how the continuous model
predicts the long term behavior of the system, while
the discrete model captures the short term behavior
dependent on initial condition. We now present these
models.
To model the spread of viruses throughout a net-
work, we assume that the network consists of N nodes.
Each node can be in one of two medical conditions:
susceptible (S) or infected (I ). We assume discrete
time steps of arbitrary units, so at any given time n,
S
= N − I. Infected nodes can remain infected or be-
come susceptible by being cured. Likewise, suscepti-
ble nodes can remain susceptible or become infected.
Denote the probability that a susceptible node be-
comes infected as µ. This rate depends on two parame-
ters, the connectivity of the network (c) and the proba-
bility of transmitting an infection (β). The probability
that an infected node becomes susceptible is δ. Fig. 1
illustrates the transitions between these two medical
conditions, with their associated probabilities.
A discrete-time Markov chain is a dynamical sys-
tem composed of S discrete states. In this Letter, the
Fig. 1. The transition diagram showing how a node can change
medical conditions.
number of nodes in each medical condition will define
a Markov state, forming the pair (I, S). Because S is
explicitly defined by I , we only refer to I , and in this
case, S
= N +1. At each time step n, the Markov chain
can change states. We compute the probability of tran-
sitioning from state I to state I
. Let X
n
be the random
variable for the Markov chain, which can take on any
of the S states at time n. The system is described by an
S
× S matrix Q, where a component of Q is given by
(1)
Q(I, I
)
≡ p
I,I
≡ P (X
n
= I
| X
n
−1
= I).
Let x be the number of infected nodes that are cured.
Since there are I infected nodes, I
− x of them are
not cured, and remain infected. If we use δ to denote
the probability of curing an infected node, then the
probability of curing x of the infected nodes is
I
x
(δ)
x
(1
− δ)
I
−x
.
Note that δ is a constant that reflects the strength of the
immune system.
Now, since I
− I more susceptible nodes are
infected than infected nodes are cured, we know that
x
+ I
− I of the susceptible nodes must become
infected. If there are S susceptible nodes then S
−(x +
I
−I) of the susceptible nodes are not infected. Use µ
to denote the probability of infecting a susceptible
node. Then, the probability of infecting x
+ I
− I of
the susceptible nodes is simply
S
x
+ I
− I
(µ)
x
+I
−I
(1
− µ)
S
−(x+I
−I)
.
The probability that a susceptible node will become
infected is largely dependent on the number of neigh-
boring nodes in the graph that are infected, as well as
the transmission probability. Therefore, µ cannot be
approximated by a constant as in the case of the cure
rate.
A random directed graph of N nodes is con-
structed by making random, independent decisions
L. Billings et al. / Physics Letters A 297 (2002) 261–266
263
about whether to include each of the N (N
− 1) pos-
sible directed edges that can connect two nodes. Each
edge is included in the graph with a connectivity prob-
ability c. If I is the total number of infected nodes, the
probability of having 0
y I infected neighbors is
I
y
c
y
(1
− c)
I
−y
.
Assume that the probability that one of these infected
neighbors transmits the infection is β. If y is the
number of infected neighbors, then the probability that
none of these neighbors will infect a given susceptible
node is (1
− β)
y
. Thus the conditional probability that
a given node will be infected by at least one infected
neighbor is 1
− (1 − β)
y
. Therefore, the probability
that a susceptible node will be infected is
(2)
µ
=
I
y
=0
1
− (1 − β)
y
I
y
c
y
(1
− c)
I
−y
.
Note that we do not use the approximation µ
= βcI,
as in [10], which approximates the correct behavior
when I is small, or β and c are small.
Finally, we have all the components needed to find
the probability of transitioning from state I to state I
.
This is achieved by summing over all possible values
of x. The bounds are determined by examining the
combinatorials. The first combinatorial implies that
0
x I. The second implies that I − I
x
S
− I
+ I. Satisfying all of these constraints, we get
p
I,I
=
min
{I,S−I
+I}
x
=max{0,I−I
}
I
x
(δ)
x
(1
− δ)
I
−x
S
x
+ I
− I
(3)
× (µ)
x
+I
−I
(1
− µ)
S
−(x+I
−I)
.
With the transition matrix defined, we can use the
Markov model to determine various quantities of in-
terest, such as the probability distribution of infected
nodes at time n, as well as the probability of extinc-
tion. The algorithm starts with an initial condition vec-
tor of length S describing which state the network is
in. This vector is actually a probability density func-
tion (PDF) with one in the position that corresponds
to the I nodes that are initially infected. The PDF at
time n is computed from iterating the initial condition
vector by the transition matrix n times. The expected
number of infected nodes is the sum of each state
multiplied by its associated probability from the PDF.
(This calculation is equivalent to averaging the solu-
tion from a simulation over a large number of runs.)
The absorbing state of zero infected nodes at time n
captures the phenomena of extinction. This is the
case where only a few nodes have the virus, and
by chance, it does not spread fast enough to avoid
extinction. From experiments, nonzero probabilities
for extinction occur when less than 10% of the nodes
are initially infected, and the probabilities increase as
the number of infected nodes is decreased. Extinction
plays an important role later when we compare the
models.
Notice at this point that the model can easily be
expanded to include m medical conditions, assuming
that the transitions between medical conditions occur
in a ring [15]. Another important point is that the pa-
rameters need not be constants—they in fact can be
functions that depend on the state that the process is in
(as seen in the computation of µ above). This level of
generality provides for nonlinear dynamics of the sys-
tem, as we will see in the following discussion con-
cerning differential equation models of viral spread.
We continue by constructing an ordinary differen-
tial equation (ODE) that models the same process from
a continuous time perspective. The advantage of this
approach is that we can determine the long term be-
havior of the system as a function of the parameters.
Therefore, we do not need to solve the ODE, just
perform a stability analysis of the constant solutions.
Time, t , is now assumed to be continuous. We now de-
rive the process that allows the system to change state.
The number of infected nodes increases by the num-
ber of susceptible nodes that get infected. That rate
is captured by µS, which is nonlinear because of the
dependence of µ on I as shown in (2). The number
of infected nodes decreases by the number of infected
nodes that are cured, δI . The reverse holds true for
how the number of susceptible nodes change.
Let i
= I/N and s = S/N be the proportion of
infected and susceptible nodes (respectively). Since
i
+ s = 1, the system can be described in one variable
by the following equation:
(4)
di
dt
= (1 − i)µ
Ni
− δi,
with µ
Ni
defined as µ in (2), but using the next
smallest integer I of N i nodes infected.
264
L. Billings et al. / Physics Letters A 297 (2002) 261–266
Fig. 2. The stable solution as a function of δ and β, with c
= 0.0505. Note the line dividing the endemic and extinction regions, δ = Nβc.
The constant solutions are found by solving
di/dt
= 0 for i. The zero solution, i = 0, exists for
all β and δ if the connectivity, c > 0. In a region
of small β, the zero solution is stable. This region is
bounded by a curve where the zero solution loses sta-
bility and a stable nonzero solution is created. This
new solution is called the endemic solution, i
e
> 0.
Here, the virus will not naturally die out, but persist at
a constant level.
The boundary curve between the stable zero solu-
tion and the stable endemic solution is found by solv-
ing di/dt
= 0 explicitly for δ and taking the limit as i
approaches zero from the positive side,
(5)
δ
= lim
i
→0
+
(1
− i)µ
Ni
i
.
We use the following simplification from (2):
µ
Ni
i
= N
Ni
y
=0
1
− (1 − β)
y
×
(N i
− 1)!
(N i
− y)!y!
c
y
(1
− c)
Ni
−y
,
and the assumption that if i is small, then µ
Ni
≈ µ
1
.
Therefore, (5) can be approximated by
(6)
δ
≈ Nβc.
Solutions on this line have neutral stability, slowing
the convergence rates of solutions for nearby parame-
ters. If δ
Nβc, this model predicts that the disease
will die out because the only solution is i
= 0. Oth-
erwise, the disease will persist at an endemic state
greater than one. See Fig. 2 for a numerical approx-
imation of the stable solution as a function of δ and β.
Overlaid in white is the line dividing the extinction and
endemic regions.
As a reality check, we also perform a simulation of
the discrete system that takes N nodes and tracks their
status, infected or susceptible, over time. Given S sus-
ceptible and I infected nodes, the simulation begins
by finding which susceptible nodes will become in-
fected. For each susceptible node, an infected neigh-
bor network is determined by a random variable coin
flip (weighted by c) for each infected node. For each
successful trial, another coin flip (weighted by β) de-
termines if the virus is passed to the susceptible node.
If the virus is passed, the process for that susceptible
node is halted and condition will be changed to in-
fected at the next time step. This process is repeated
for all susceptible nodes. Next, the simulation finds
which infected nodes will be cured. A random coin
flip (weighted by δ) is used for each infected node.
A success results in the node changing to susceptible
at the next time step. After all the nodes have been
considered, instantaneously all the new conditions are
updated and process is repeated. Each loop represents
a time step and a run represents T time steps.
We continue with a comparison of our two models
with the simulation. The expected number of connec-
L. Billings et al. / Physics Letters A 297 (2002) 261–266
265
Fig. 3. This graph shows the time series data from the Markov
model and the simulation for the parameters N
= 100, β = 0.1200,
δ
= 0.2000, and c = 0.0505. The dashed line represents the endemic
state predicted by the ODE.
tions emanating from a single node is c(N
− 1). We
assume in a network of 100 nodes a connectivity of
5%. Therefore, in all calculations, we have fixed the
connectivity parameter at c
= 0.0505. Also note that
the simulation data is an average of the results over
3,000 runs. Sample time series are shown in Fig. 3.
The Markov model and the simulation also need an
initial condition in the form of the number of nodes
initially infected. The initial condition is important
when considering the probability of virus extinction,
when the solution reaches i
= 0. In the simulation,
the zero solutions are just part of the results averaged
over the different trials. The Markov model captures
the probability of an i
= 0 solution specifically, and
that is averaged in with the nonzero probabilities. As
discussed in [10], the distribution splits into an “ex-
tinction” component and a “survival” component. See
Fig. 4 for an example PDF. The extinction compo-
nent exists only for small initial conditions. Note that
the survival component has a mean of 60.2114 and a
standard deviation of 5.6938. The standard deviation
captures the fluctuations around the expected behavior
that will happen in the separate runs of a simulation.
Table 1 shows the results for the three methods over
a range of different initial conditions when β
= 0.1200
and δ
= 0.2000. The extinction probability from the
Markov PDF is shown in the last column. After renor-
malizing the survival component, the nonzero solu-
tions for both the Markov model and the simulation
match the ODE prediction well. Note that the ODE
can only capture the unstable i
= 0 solution in the en-
demic region when the initial condition is i
= 0, or no
nodes are infected initially.
Fig. 4. An example PDF from the Markov model. The parameters
are N
= 100, β = 0.1200, δ = 0.2000, c = 0.0505, and the initial
condition of 1 node infected. The extinction component is at i
= 0.
The survival component has a mean of 60.2114 and a standard
deviation of 5.6938.
Table 1
A comparison of the data for parameters N
= 100, β = 0.1200,
δ
= 0.2000, and c = 0.0505. The first column contains the initial
condition used in the Markov model and the simulation. The last
column shows the extinction probability found from the Markov
model
I
0
ODE
Simulation
Markov
Extinction
100
60.4450
60.1575
60.2114
0
80
60.4450
60.1115
60.2114
0
60
60.4450
60.2890
60.2114
0
40
60.4450
60.1985
60.2114
0
20
60.4450
60.2050
60.2114
0
10
60.4450
60.2120
60.2111
0
8
60.4450
60.2145
60.2089
0.000004
6
60.4450
60.1705
60.1835
0.000041
4
60.4450
59.6570
59.8765
0.000462
2
60.4450
55.4790
55.8793
0.005562
1
60.4450
42.9985
44.2045
0.265845
The original results from the Markov model and the
simulation tests are close for all β and δ parameters.
Renormalizing the survival component removes the
dependence on initial conditions so the result matches
the ODE as well. In general, the only region where
the methods do not agree is near the line of neutral
stability predicted by the ODE, δ
= Nβc. Except for
extreme values of δ (near 0), even those errors are
within 2%, where extremely slow convergence occurs
when δ and β are very close to zero.
The ODE model is useful for exploring how the
behavior of the system varies with N . Fig. 5 shows
how the number of infected nodes can be reduced by
simply reducing N . This illustrates the effectiveness
of temporarily partitioning a large network into small
isolated components in order to reduce an infection.
We are currently extending this framework to more
diverse and nonhomogeneous networks.
266
L. Billings et al. / Physics Letters A 297 (2002) 261–266
Fig. 5. This is a graph of how the long term behavior changes with
the size of the network, N . For example, the parameters in the ODE
are β
= 0.1200, δ = 0.2000, and c = 0.0505.
In summary, we have shown how a detailed pic-
ture of short and long term dynamics of viral propa-
gation in connected networks can be obtained using
both Markov and ODE models. We have tried to com-
bine the traditional tools of population biology with
a graph-theoretical approach of studying the persis-
tence of computer viruses. We have focused on im-
proving the approximation of the contact rate, which
is a parameter that captures the effective transmission
of a disease. By extracting the contact rate parameter
from real data, we could produce a metric to predict
the spread of a virus. Once a tipping point is identi-
fied, ways to control the parameter away from it could
effectively prevent large-scale epidemics.
Acknowledgements
This work was supported by the Office of Naval
Research (ONR). L.B. is supported by the ONR under
grant N00173-01-1-G911. We wish to thank Erik Bollt
for his useful comments.
References
[1] D. Callaway, M. Newman, S. Strogatz, D. Watts, Phys. Rev.
Lett. 85 (2000) 5468.
[2] M. Newman, D. Watts, Phys. Rev. E 60 (1999) 7332.
[3] A. Barabasi, R. Albert, H. Jeong, Physica A 272 (1999) 173.
[4] S. Pandit, R. Amritkar, Phys. Rev. E 60 (1999) R1119.
[5] R. Cohen, K. Erez, D. Ben-Avraham, S. Havlin, Phys. Rev.
Lett. 85 (2000) 4626.
[6] C. Moore, M. Newman, Phys. Rev. E 61 (2000) 5678.
[7] D. Watts, S. Strogatz, Nature 393 (1998) 440.
[8] B. Soh, T. Dillon, P. County, Comput. Networks ISDN Syst. 27
(1995) 1447.
[9] I. Schwartz, J. Math. Biol. 30 (1992) 473.
[10] J. Kephart, S. White, in: Proceedings of the 1991 IEEE
Computer Society Symposium on Research in Security and
Privacy, 1991, pp. 343–359.
[11] F. Hoppensteadt, Mathematical Methods of Population Biol-
ogy, Cambridge University Press, Cambridge, 1982.
[12] N. Bailey, The Mathematical Theory of Infectious Diseases
and Its Applications, Oxford University Press, New York,
1975.
[13] R. Pastor-Satorras, A. Vespignani, Phys. Rev. Lett. 86 (2001)
3200.
[14] A.-L. Barabasi, R. Albert, H. Jeong, Physica A 281 (2000) 69.
[15] W. Spears, L. Billings, I. Schwartz, Modeling viral epidemi-
ology, Naval Research Laboratory, NRL/MR/6700-01-8537,
2001.