plik


ÿþComputers & Security (2005) 24, 280e286 www.elsevier.com/locate/cose Infection dynamics on the Internet David B. Changa,*, Carl S. Youngb a Consultant b Goldman Sachs & Co., Office of Global Security and Office of Information Security, 85 Broad Street, New York, NY 10004, USA Accepted 15 March 2005 KEYWORDS Abstract In previous works, the connectivity of nodes in social networks such as Network security; the Internet has been shown to follow a scale-free distribution in which there is Virus; a larger probability of nodes with lower connectivity and a smaller probability of Scale-free nodes with higher connectivity. This network structure facilitates communication but also aids in the propagation of viruses. In this work, solutions have been obtained for a dynamical mean-field equation that characterizes virus infections and growth in scale-free networks. In contrast to previous findings, a threshold condition has been found for the persistence of computer infections. The effect of connectivity-dependent growth and recovery rates is also reported. It has been found that it is possible to reduce the deleterious effects of viruses by preferentially discouraging growth and enhancing recovery in high-connectivity nodes. Significantly, a security   figure-of-merit  has been derived that will allow network administrators to sample their environment in real time and measure the risk relative to E-mail-borne threats. ª 2005 Published by Elsevier Ltd. where a represents the fraction of infected nodes, Introduction and background t is time, and a is the rate at which nodes become infected. Re-infection of disinfected and therefore Simplified mathematical descriptions of the dynam- susceptible nodes is not considered in this simpli- ic behavior of viruses in biological and computer fied model of behavior. A solution to Eq. (1) is systems involve the well-known logistic equation. given by This is a first order non-linear differential equation of the form eat aZ ð2Þ 1Ceat da=dtZaað1 ÿ aÞð1Þ for the case where a Z 1/2 at t Z 0. A plot of Eq. (2) yields the familiar sigmoid * Corresponding author. where the initial fraction of infected nodes is E-mail addresses: dbcsfc@aol.com (D.B. Chang), carl. young@gs.com (C.S. Young). small. Some time later, the fraction of infected 0167-4048/$ - see front matter ª 2005 Published by Elsevier Ltd. doi:10.1016/j.cose.2005.03.004 Infection dynamics on the Internet 281 nodes rises precipitously. For large time t, a ap- distribution is thought to have important security proaches unity, as all nodes have either been implications, where the highly-connected nodes infected and died or have developed an immunity play a critical role in facilitating virus propagation from infection. For biological systems the logistic (Ebel et al., 2002). Therefore, smaller values for g equation describes a population where a fraction imply a greater number of highly-connected nodes of the community has either died or developed in the network. Typical values have been calculated antibodies to the infection. The analogue of de- to be in the 2e4 range, and one study revealed veloping antibodies in a computer network is a measured value of 1.81 (Ebel et al., 2002). characterized by the remediation and patching of In an important work published in 2001, an nodes. It is clear from Eq. (3) that the dynamic analysis of the propagation of computer viruses behavior of an infection is solely dependent on the was performed using a   mean field  analysis infection rate a. (Pastor-Satorras and Vespignani, 2001). In this However, this model assumes equal probabili- paper, data on viral infections on the Internet ties for node linking and a constant network size. was analyzed, and a mean field equation depicting In other words, assumptions inherent in Eq. (1) are the time evolution of the probability of viral that the probability of infecting a particular node infection as a function of the node s connectivity is independent of the particular node itself, and was introduced. Mean field approximations repre- that the network adds no new nodes with time. sent a form of averaging over many elements of E-mail-type networks fall into a category known as a system, and are often used in physics and phase social networks that exhibit both growth and transition-type calculations. preferential attachment (Barabasi and Albert, Pastor-Satorras and Vespignani (2001) used nu- 1999). merical simulation to study the time behavior and With respect to growth, standard network mod- steady state of virus propagation, as well as to els often assume there are a fixed number of nodes obtain analytic expressions for the steady state that are either randomly connected (Erdos and virus-spreading condition. The time rate of change Renyi) or exhibit small world behavior and cluster- of the probability rk of a node with connectivity k ing (Watts and Strogatz), but where the total infected with a virus was found to equal the decay number of nodes never changes. Networks such in the probability of infection resulting from as the Internet are continuing to add nodes, applying network remediation (e.g., patching in- thereby increasing the number of vertices with fected nodes) plus a term proportional to the time. probability of linking to an already-infected node. Some networks also display preferential attach- In the steady state, vrk/vt Z 0. ment, where the probability of connecting to The authors also relied on a widely-cited result a new node is greater for nodes that already by Albert et al. (2000) that specified a value for the exhibit a higher number of connections. This exponent g in Eq. (3). Importantly, a narrow range characteristic is an important feature of the of nodes relative to their connectivity was exam- Internet, and accounts for many of the important ined in this work. The values of connectivity behavioral phenomena associated with the propa- examined ranged from nodes of low connectivity gation of viruses. Moreover, the combination of where the virus decay rate exceeded the growth preferential attachment and the continuous addi- rate and included nodes of higher connectivity tion of vertices leads to a model of network growth where the virus growth rate exceeded the decay that is scale invariant (Barabasi and Albert, 1999). rate. This analysis yielded an expression for the In contrast with other network models, the steady-state probability Qss that a given node in topology of social networks such as the Internet a scale-free network pointed to an infected node. can be characterized by a scale-free distribution of This important expression was given by network nodes. In these types of networks, the expÿ1=lm probability of connectivity P(k) for any node of QssZ ð4Þ connectivity k, scales as a power law: lm PðkÞZkÿg ð3Þ d denotes the remediation rate of infected nodes (i.e., the rate of nodes being restored for m ! k ! kmax. following infection). Eq. (3) suggests that for scale-free networks, y is the infection rate of an uninfected node if a large number of its nodes have a small number it is connected to an infected node. of links to other nodes, and a small number is k is the number of connections or links of highly-linked. Moreover, this inverse power law a node. 282 D.B. Chang, C.S. Young m is the minimum number of nodes available Eq. (5) sets an upper limit on the magnitude of for connection. the remediation-to-infection rate (i.e., d/y Z l Z y/d. 1/l). In fact, Eq. (5) defines the condition that separates a persistent infectious state from Eq. (4) implies that zero values of Qss are not a non-persistent one. We also see that the larger permitted for any finite l. This suggests that the kmax, the easier it is to satisfy Eq. (5). a computer virus can pervade a network with finite By applying Eq. (5) to the mean field equation prevalence in sufficiently large networks; once for the steady-state condition (i.e., when vrk/ established, viruses will grow or decay but not vt Z 0) and evaluating this expression under vari- remain static under steady-state conditions. The ous network connectivity conditions, we can fur- authors also concluded that these results implied ther characterize the probability that a node will scale-free networks of sufficient size required no be infected in the steady state, rss. k threshold for epidemic spreading. These results Such an analysis reveals that when there is low dramatically departed from previously held no- node connectivity, i.e., (k/m)exp(ÿ1/lm) 1, tions on infections since it was believed that viruses died out (i.e., the prevalence is zero) rssZðk=mÞexpðÿ1=lmÞð6Þ k below some threshold infection rate. The expla- nation given for this departure was the increased statistical likelihood of encountering nodes with When there is high connectivity, i.e. (k/m)exp(ÿ1/ higher connectivity in scale-free networks. lm) [ 1, In the data analysis portion of Pastor-Satorras and Vespignani (2001), the surviving probabilities rssw1 ð7Þ k of 814 different viruses in the 50-month-period between February 1996 and March 2000 were examined. It was found that file viruses (i.e., those Therefore, when the steady-state condition ap- that infect a computer when it runs an infected plies, the probability that a node with small application) exhibited an exponential decay in connectivity is infected can be much less than 1 time with a characteristic time constant of seven (increasing linearly with connectivity k), and the months. Boot viruses (i.e., those that spread by probability that a node with large connectivity infected applications but copy themselves on to becomes infected is almost 1. the boot sector of the hard drive) and macro As noted previously, Pastor-Satorras and Ves- viruses (i.e., those that infect data files and are pignani (2001) assumed values of node connectiv- therefore platform-independent), also exhibit ex- ity such that the decay rate exceeds the growth ponential decays but with a characteristic time rate for nodes of low connectivity and where the constant of 14 months. Some of the data examined growth rate exceeds the decay rate for nodes of also suggested that there might be a low level high connectivity. However, there are two other persistence in the viral infection. These findings important ranges of network connectivity condi- tended to support the analytical conclusions as tions to consider. expressed in Eq. (4). The first case is when the infection growth rate y greatly exceeds the decay rate d for all node connectivity values k. This situation exists in a network that has little anti-viral prevention and Network viruses in steady-state little remediation software. By applying a steady conditions state condition to the mean field equation it becomes apparent that a persistent infection state Further examination of the steady state condition is possible in which the probability that a given link yields interesting properties of virus propagation in points to an infected node is close to unity for all scale-free networks. Applying Eq. (4) to the afore- connectivity values k. mentioned range of steady-state values of connec- In the second case, the infection decay (i.e., tivity yields the condition remediation) rate greatly exceeds the growth rate for all k. This can occur in networks for which 1 expð1=lmÞ ðkmax=mÞð5Þ attention is paid to maintaining viral prevention software, rapid incorporation of patches, and kmax Z Nÿ1 Z the maximum number of nodes that diligence in implementing remediation measures. a single node can connect to, and m Z the minimum In this case we find no non-zero steady state exists number of nodes available for connection. when exp(1/lm) O kmax/m. Infection dynamics on the Internet 283 To recap, we have shown that a network infec- figure, the lthreshold curve of Fig. 1 is reproduced, tion condition can exist under two conditions in along with the curve for condition 1 (l Z 1/m Z 1/ the steady state: 3) discussed previously, where the virus growth rate exceeds the decay rate for all nodes. Above 1. When infection growth is larger than decay for the l Z 1/3 curve the probabilities of persistence large connectivity k and infection growth is approach unity. Between the two curves in Fig. 2 smaller than infection decay for small connec- the virus persists but with smaller probability. The tivity. persistence probabilities decrease as the ordinate 2. When infection growth is larger than infection position is decreased, and they become vanishingly decay for all k. small as the lower curve is approached. Below the lower curve, there is no persistence in the virus No steady state condition is possible when the infection. decay rate is greater than the infection rate for all connectivity k. General network infection conditions This shows that a threshold condition does indeed exist for network infection persistence in In the previous section we examined virus infec- the steady state, even for a scale-free network. tions only for the steady-state network condition. This condition depends on the size of the network As we noted earlier, in the steady-state the time through the maximum number of nodes available rate of change of probability of linking to an for connection kmax, and is given by infected node is zero. A more general situation relative to virus propagation can be obtained by lZy=dOlthreshold ð8Þ considering the time evolution of infections lead- ing up to a steady-state condition. We wish to where lthreshold Z [m ln(kmax/m)]ÿ1. explore the general time-dependence of infection Eq. (8) implies that the larger the network, the spreading, and what happens near the threshold of lower the threshold condition for infection persis- viral persistence. tence and hence a greater vulnerability to in- We assume that a virus is introduced into the fection. As indicated above, kmax can be set to network at nodes that do not have a specified Nÿ1, where N is the number of nodes in the connectivity value. Based on the discussion in network (a node cannot connect to itself, hence Sections Introduction and background and Network the Nÿ1 term). We also note that the logarithmic viruses in steady-state conditions, it might be condition for the threshold condition only applies expected that the most damage would occur if for the scaling exponent g Z 3 (considered to be the virus is introduced into the network via high a typical value for Internet/E-mail networks). connectivity nodes. However, we address the more Fig. 1 below shows the variation of lthreshold with general case in which the connectivity of the network size for g Z 3 (the value used in Barabasi initially infected nodes is arbitrary. and Albert, 1999 and Pastor-Satorras and Vespignani, At time t Z 0 for a small group of initially 2001). Infection persistence occurs when l O infected nodes, the mean field equation for the lthreshold. If l ! lthreshold, the decay rate exceeds time rate of change of the probability of linking to the growth rate and the infection dies out. an infected node simplifies to The three possible behaviors of viral infections in the network are shown in Fig. 2 below. In this vrk=vtw ÿ drk ð9Þ 0.055 Direct integration of Eq. (9) yields 0.05 rkðtÞZexpðÿdtÞð10Þ 0.045 0.04 Therefore the infection probability from the small 0.035 group of initially infected nodes drops off rapidly in time, with a time scale determined by the 3.5 4 4.5 5 5.5 6 recovery rate d. The mean field equation de- scribing the probability of linking to an infected Log[kmax] node now derives from two parts: (1) nodes not Figure 1 Variation of threshold with network size. initially infected and (2) nodes initially infected. threshold » 284 D.B. Chang, C.S. Young .5 0.4 » threshold ( k max ) ( k max ) » allgrowth 0.2 .001 5000 1.104 101 104 ( k max ) Figure 2 Boundaries between three regions of virus behavior. The probability of linking to an infected node In particular, Fig. 2 shows three regions sepa- via a node not initially infected can be obtained by rated by two curves: below the lowest curve no taking the first moment of the modified mean field persistent infection exists. Between the two equation. This yields an expression in terms of the curves, infections persist, but at a low level when first and second moments of the scale-free distri- near the lower curve. Above the upper curve, the bution P(k). infection probability of each node is close to unity. Recall that the first and second moments of P(k) When nodes in a narrow range of connectivity are defined as !kP(k)dk Z D1 and !k2P(k)dk Z D2, are initially infected, there will be no persistent respectively. viral infection in the network if 1 O lm ln(kmax/m) Using this method, a condition for the growth of for g Z 3. Since l Z y/d, this suggests increasing network infections to a persistent state has been the intrinsic decay rate and decreasing the in- found to exist when (y/d)(D2/D1) O 1. Conversely, trinsic growth rate. In addition, the no-persistence the condition for non-persistence of infection can condition will be easier to satisfy with smaller be shown to be (y/d) (D2/D1) ! 1. networks, since kmax in the logarithm term is given The probability of infection by nodes that were by Nÿ1, where N is the number of nodes in the initially infected continues to grow until it reaches network. the steady state or persistent value as specified in The time for the infection probability to reach the previous section. Specifically, when the scaling a maximum in those nodes not initially infected is exponent g is 3, the condition for persistent inversely proportional to ln(kmax/m). This suggests infectious growth becomes (kmax/m) O exp(1/lm) that a larger network will also result in a shorter as before. incubation time for a virus. Once infected, the For nodes not initially infected, a similar anal- decay time can become very long as lm ln(kmax/m) ysis reveals that when (y/d)(D2/D1) ! 1, the prob- approaches unity from below. This again implies ability of infection grows to a maximum value and increasing the intrinsic decay rate and decreasing then decays to zero. The probability of linking to the growth rate of a node. Smaller networks have an infected node that was not initially infected is shorter decay times. directly proportional to its connectivity k. Our results show that despite the fact that the We can also estimate the time required to probability of a link being connected to an in- achieve a persistent viral state by setting the fected node that was initially uninfected increases general probability of linking to an infected node with network size, the individual node infection equal to the probability in the steady-state. It has probabilities decrease with larger networks. This been found that for g Z 3, the time to achieve implies that an increase in network size is favor- non-zero persistence can be made quite long if the able relative to the chances of infecting any values for kmax, l and m are kept small. specific node. We therefore see that increased network size has competing effects on security. On one hand, Summary of results the no-persistence condition is easier to satisfy with smaller networks, as well as producing shorter It has been found that a threshold exists for the infection decay times. On the other hand, the persistence of an infection in scale-free networks probability of a particular node being infected such as the Internet. Figs. 1 and 2 plot threshold increases with network size. conditions of kmax (network size) versus l Z y/d, The infection probabilities are proportional to the ratio of intrinsic growth to intrinsic decay rates the node connections that have been previously for a scaling exponent g Z 3. infected. This would suggest that the most damage Infection dynamics on the Internet 285 0.02 0.2 0.018 0.167 » 3( k max) 0.15 » 2.5( k max) » 3.5( k max) 0.01 » 2( k max) 0.1 » 4( k max) 5.523×10- 6 0.06 0.05 0 0 5.105 1.106 0 5.105 1.106 1×103 k max 1×106 1×103 k max 1×106 Figure 3 Threshold lg vs kmax for various g Z 3, 3.5 and 4 [plot on left] and g Z 2 and 2.5 [plot on right]. The subscript of l indicates the corresponding network scaling exponent g. is achieved by infecting high-connectivity nodes, self-propagating code such as Code-Red (Moore in agreement with intuition. However, if a steady et al., 2003, 2002). Quick intervention and re- state condition applies (i.e., lm ln(kmax/m) O 1), mediation of high-connectivity nodes will increase the infection probability is independent of the the virus incubation time by decreasing the value connectivity of originally targeted nodes. We also of l, m, and kmax which appear in the denominator confirmed that adjusting the infection growth and of the expression for the time-to-persistence. decay rates induces the probability of node in- There has been considerable documentation of fection to change maximally for the highest con- modes of infection spreading. These typically nectivity nodes. involve variations on a similar theme, where In Fig. 3 below, the threshold l is plotted against viruses self-replicate and then distribute them- the connectivity kmax of the network for various selves to address book entries, MAPI mailboxes or scaling exponents g. It is interesting to note that: some other means of E-mail-based distribution. Examples of such viruses include Nimda, SoBig-A, the larger the network, the lower the threshold and variants of Melissa (Information World Review, value of l, 2001, 2003, 1999). Furthermore, in at least one the larger the exponent g for a distribution case it has been explicitly demonstrated by direct P(k), the higher the allowable value of l. measurement that a seemingly typical E-mail network obeyed a scale-free distribution with g Z 1.81 (Ebel et al., 2002). The continued exploi- Network security implications tation of E-mail as a means of virus transmission coupled with the prevalence of contact and/or The results herein suggest alternative approaches address lists creates a ready means of directed to network organization and surveillance in order attacks. to enhance security. Networks have traditionally We are not aware of an automated method of been organized into subnets based on differences examining server logs in order to determine the in functionality or user groups, rather than accord- changing hierarchy of node connectivity, and ing to topological features. However, the confir- thereby monitor the risk of infection in a targeted mation of the existence of a threshold for infection fashion. In lieu of this capability, the number persistence has significant implications, since any of entries in network users contact lists might actions that contribute to remaining below that be considered to identify the high-connectivity threshold decrease the vulnerability to infection nodes. It is not unreasonable to assume that the spreading. number of entries in contact lists follows a scale- First, it is clear that priority quarantining and free distribution across the network community patching of high-connectivity nodes is mandated. and might mirror the distribution data containing These results as well as the results of others in the server logs. Future security products might (Albert et al., 2000; Ebel et al., 2002; Pastor- include those that identify and monitor high- Satorras and Vespignani, 2001) argue strongly for connectivity network nodes in real time. preferentially monitoring these specific nodes for In view of the direct dependence on the infections. This strategy is consistent with pub- number of available nodes for connection kmax by lished recommendations for defending against the persistence threshold value, segmenting the 286 D.B. Chang, C.S. Young Ebel H, Mielsch LI, Bornholdt S. Physical Review E 2002;66. network into a hierarchy according to the number 035103(R). of nodes would appear to be advantageous. In that Erdos P, Renyi A. Publ. Math. Inst. Hung. Acad. Sci. 1960;5:17. vein, one might envision a pyramid-shaped net- Moore D, Shannon C, Claffy K. Code-red: a case study on the work topology, such that the segment with the spread and victims of an internet worm. Internet measure- lowest population has a single node. In some sense ment workshop. In: Proceedings of the second SIGCOMM workshop on internet measurement; 2002. p. 273e84. this implies a re-examination of the fundamental Moore D, Shannon C, Voelker GM, Savage S. Internet quarantine: notion of a node, where each segment consisting requirements for containing self-propagating code; April of a varying number of nodes might be considered 2003. Infocomm 2003, San Francisco, Ca. a node unto itself. New melissa virus variant on the loose. Information World Finally, and for what is believed to be the first Review October 19, 1999. Nimda worm most virulent virus ever. Information World Review time, a true security metric can be explicitly September 21, 2001. communicated based on these results. This metric Pastor-Satorras R, Vespignani A. Physical Review Letters 2001; will enable network administrators to sample their 86:3200. environment and actually measure the exposure to SoBig virus infections on the rise. Information World Review risk relative to E-mail-borne viruses in real time. January 15, 2003. Watts DJ, Strogatz SH. Nature 1998:292e440. Specifically, these results suggest the creation of a security figure-of-merit Dr Chang has 45 years of experience in industry, government, SZD1=ðD2lÞð11Þ and academia. He s served in a number of technology director, chief scientist, and senior technical management and research positions at Hughes Electronics, Occidental Research, Boeing, where Di represents the ith moment of the the U.S. Department of Commerce, and USC. He has held connectivity probability distribution P(k), and as adjunct professorships at UCI, UW, USC, and CSULA, and before l Z y/d is the ratio of infection growth-to- currently consults for a variety of organizations. Dr Chang has decay rates. over 200 publications and patents in several areas of basic and applied physics. Larger values of S imply an enhanced defense relative to the susceptibility to computer virus Carl Young is an applied physicist with a specific focus on infection. In particular, a value of S Z 1 represents quantifying risk and solving complex security problems. the threshold condition for viral persistence once Mr. Young spent 15 years in the US government designing, the virus has been introduced into the network. developing, and deploying security technology. In 1997 he was Such a metric may offer opportunities for the awarded the James R. Killian medal by the White House for development of security software designed to individual contributions to national security. He is currently the Director of Research and Analysis for the Office of Global measure, report, and alert on the value of S as Security at Goldman Sachs & Co., and lectures on science and the network connectivity evolves with time. technology applied to security as an adjunct professor at Polytechnic University in New York City. He has authored a wide variety of papers on technical security and risk-related References problems, and is the author of The Science of Security.And the Fundamentals of Risk Mitigation (to be published). Albert R, Jeong H, Barabasi AL. Nature 27 July 2000;406: Mr. Young holds bachelors and masters degrees in applied 378e381. mathematics and physics from the Massachusetts Institute of Barabasi AL, Albert R. Science 1999;286:509e11. Technology, Cambridge, Massachusetts.

Wyszukiwarka

Podobne podstrony:
Contagion on the Internet
Optional Protocol to the International Covenant on Economic, Social and Cultural Rights
Ruzgar A reaserch on the purpose of Internet usage 2005
Report on the 21st International Symposium on Shock Waves
Report on the 6th International Workshop on the Physics of Compressible Turbulent Mixing
making vise clamps on the milling machine
Dennett Facing Backwards on the Problem of Consciousness
Destiny´s Child Get on the Bus
Dijksterhuis On the benefits of thinking unconsciously
Frater SMRD Notes on the Tarot
Down on the Farm Maureen F McHugh(1)
130416131646?c tews0 on the up
The Social EconomyBR The dynamics of the social economyBR Example of Basta Arbetskooperativ
1365 Lipstick on the glass Maanam
Morderstwo w Orient Expressie (Murder on the Orient Express)

więcej podobnych podstron