SECTION I
INTRODUCTION
Tor [1] is one of the most popular anonymous communication systems in the world. In the most recent months, there are hundreds of thousands of users and more than 3000 active servers in the Tor network [2] [3]. Recent years, it has become a research hotspot in the anonymous communication area. Existing research in this field mostly focuses on censorship resistance [4] [5], privacy enhancement [6] [7] [8], performance improvement [9] [10] and scalability [11] [12].
Tor family design was first introduced in 2004 with the release of Tor version 0.0.9pre4. A node family is a set of Tor nodes that are under the administrative control of the same person or organization. To improve anonymity and reduce the risk of traffic correlation attacks, Tor never relays users traffic through more than two nodes from the same node family. We noticed that, family nodes (Tor nodes from Tor families) have been playing an increasingly important role in the Tor network.However, some related work in measuring the Tor network mostly studies the characteristics of the entire Tor network, without much consideration of the difference between family nodes and the others [3] [13] [14] [15]. This paper makes distinction between family nodes and the others. We compare the two types of Tor nodes, trying to answer the following question: what is the contribution and influence of family nodes on the entire Tor network? Motivated by this question, we give an empirical analysis of Tor family in this paper. The main findings and contributions of this paper can be summarized as follows:
Family nodes compose a small but full functional subset of Tor nodes. Compared with other Tor nodes, they can provide relatively stable and high-performance service to Tor users (Section III).
Family nodes naturally form a hot area in the Tor network, relaying large traffic through a small number of nodes. And along with the their increasingly contribution to Tor's bandwidth in recent years, the traffic density through them become even higher (Section III).
As a small but high-bandwidth subset of Tor nodes, family nodes may become ideal attack targets. And selective attacks targeting family nodes may cause disasters to Tor's availability with lower cost than attacks that focus on random nodes (Section IV).
SECTION II
BACKGROUND AND RELATED WORK
A. Understanding Tor
Figure 1 illustrates the basic components and service process of the Tor network. As the core of the Tor network, Tor directory servers gather and distribute information about the whole Tor network. Tor relay nodes form the basis of the Tor network and traffic between the sender and receiver is forwarded by them in a hop-by-hop fashion. Tor proxy is the client-side component serving as an interface between user applications and the Tor network.
According to the different flags that Tor directory servers assign to each relay node, Tor nodes can be classified into four types: “Exit”, “Guard”, “Middle” and “Double” nodes (denoted as E, G , M and D respectively).E nodes are used to denote Tor nodes with only “Exit” flag while G nodes represent Tor nodes with only “Guard” flag. Nodes with neither “Guard” nor “Exit” flag are considered to be M nodes, while nodes with both “Guard” and “Exit” flag are considered to be D nodes. D nodes are not considered to be either “Guard” nodes nor “Exit” nodes throughout this paper. In the circuit construction phase, Tor weights node selection according to node bandwidth. Besides, Tor also weight the bandwidth of “Exit” and “Guard” flagged nodes depending on the fraction of total bandwidth that they make up and depending upon the position they are being selected for. Let pen pm and pex to be the entry position, middle position and exit position in a circuit as show in Figure 1. The bandwidth weight for Tor nodes of type t serving at p position can be denoted by Wp,t where p = pe,pm or pex and t = G, E, M or D. In practice, Tor directory servers calculated the values of Wp,t based on a few rules and publish them in the hourly-announced Tor network consensus file. Let B(ni) and T(ni) to be the bandwidth and node type of Tor node ni respectively. The approximate probability that node is chosen to serve in a circuit at position can be determined by (1). TeX Source
Figure 1. Tor network's basic components and its three-phase process of providing anonymity service: information acquisition, circuit construction and data transmission
B. Tor Family
A node family is a set of Tor nodes that are under the administrative control of the same person or organization. All paths Tor generates obey a constraint that “We do not choose any router in the same family as another in the same path”. This design can help to improve anonymity and reduce the risk of traffic correlation attacks. There are many node families in the Tor network. We refer to Tor nodes from these node families as family nodes. Compared with other Tor nodes, family nodes can be considered to be more organized and under more centralized control.
C. Tor Measurment
Most measurement studies about the Tor network are done by Tor Metrics Portal [3]. It gathers live Tor network data and provides information about the Tor network including Tor relays and the users. Their statistics show some characteristics about Tor relays: the number, the country distribution, software versions, platforms, bandwidth and so on. As for the Tor users, it estimates the number of daily directly connecting users as well as that of Tor bridge users. Furthermore, some active measurements are conducted to study the performance of the Tor network as experienced by its users. Tor Metrics Portal also publishes a few papers and technical reports in the field of Tor network measurement. Loesing [13] gives an measurement of the Tor network based on the existing directory information from February 2006 to February 2009. Its results show trends and reveal problems in the network ranging from flag assignment, dynamic IP addresses support to software upgrade and so on. Besides, reports published by Tor Metrics Portal between 2009 and 2012 also help to understand a few features of the Tor network, e.g., Tor bridge usage and stability, censorship-detection system for Tor. However, due to the diversity in motivations, most of these statistics, papers and technical reports perform their study over the entire Tor network, without much consideration about the special and important role played by family nodes.
Besides Tor Metrics Portal, a measurement study presented in [14] shows how Tor is being used and mis-used in the world. It also reveals a fact that Tor is widely used throughout the world while the router participation is limited to only a few countries in 2008. Based on passive collected Tor node information as well as Tor Metric Portal data set, Li et al. [15] discovers a few “super nodes” in the Tor network, nodes that are more available, reliable and providing more bandwidth than others. In this paper, we show that, family nodes naturally coincide with some of the characteristics of the so called “super nodes”, e.g., family nodes are more stable and providing higher bandwidth than other Tor nodes.
SECTION III
MEASURING FAMILY NODES IN THE TORNETWORK
In order to analyze family nodes’ contribution to the entire Tor network, Tor nodes are divided into two sets in this paper: family nodes and non-family nodes. Let N to be a set of Tor nodes. Tor node that has a family declaration in its node descriptor is considered as family node. Likewise, Tor node with out a family declaration is considered as nonfamily node. F N and N F N are used to denote family nodes and non-family nodes respectively, .
The live Tor network data provided by Tor Metrics Portal can help to perform our study in family nodes. We downloaded all consensus and descriptor archives from 2009 to 2011 from their website 1. Based on this data set, we compare Tor nodes, family nodes and non-family nodes from aspects of the number of nodes, bandwidth, online time and node type in this section.
Figure 2. 3-month average # of online Tor nodes and family nodes from Jan. 2009 to Dec. 2011
A. Number of Family Nodes
The 3-month average amounts of online Tor nodes and family nodes from the year 2009 to 2011 is given in Figure 2. The average # of online Tor nodes in each of the 3-month interval is calculated as . And that of online family nodes is calculated in a similar way by counting only family nodes. The curve shows the percentage of family nodes in the entire Tor network. It can be observed from the figure that:
Family nodes consist only a small fraction (less than 15%) of Tor nodes in the entire Tor network.
From the long-term view, both the amount of Tor nodes and family nodes increased from 2009 to 2011. The number of Tor nodes increased from about 1,200 in early 2009 to approximately 2,500 in late 2011, while that of family nodes increased from about 130 to approximately 350.
The percentage of family nodes in the Tor network increased from about 10% in early 2009 to approximately 15% in late 2011.
B. Bandwidth
Figure 3 gives the 3-month average aggregate bandwidth provided by Tor nodes and family nodes from 2009 to 2011. These values are calculated based on the estimated bandwidth of Tor nodes listed in Tor consensus files. The average aggregate bandwidth provided by Tor nodes in each 3-month interval is calculated as . And that of online family nodes is calculated in a similar way by taking only family nodes into consideration. The curve gives family nodes’ contribution to the aggregate bandwidth of all Tor nodes. The average aggregate bandwidth of both Tor nodes and family nodes in the last 3-month interval of 2011 is much higher than that in the former intervals. We investigated the this “anomaly” and found that, the estimated bandwidth listed in some Tor consensus files of late November and December 2011 is “overestimated” by almost two orders of magnitude. The underlying reason for the “overestimation” is unknown and this “anomaly” has little influence on the following analysis. Therefore, we will treat them as valid data here. Figure 3 reveals that:
Family nodes do contribute a lot to the aggregate bandwidth of the Tor network, especially in the year 2011.
Both the aggregate bandwidth of Tor nodes and that of family nodes increased from 2009 to 2011, even without taking the last 3 months into consideration.
The bandwidth contribution of family nodes was growing rapidly, from about 20% in early 2009 to approximately 60% in late 2011.
Figure 3. 3-month average aggregate bandwidth of Tor nodes and family nodes from Jan. 2009 to Dec. 2011
Figure 4 shows the approximately average bandwidth provide by Tor nodes, family nodes and non-family nodes on a 3-month basis. The approximately average bandwidth value in each 3-month interval is calculated as It can be observes from the figure that, in 2009, family nodes provide an average bandwidth about 2 times as high as that of all Tor nodes in the network. And in 2011, family nodes have an average bandwidth about 4 times as high as that of all Tor nodes. Furthermore, compared with non-family nodes, family nodes provide an average bandwidth of about 8 times higher in 2011. It can be concluded that, the average bandwidth of family nodes is higher that of all Tor nodes, especially non-family nodes. The underlying reason may be that, volunteers who care about helping Tor users so much to contribute more than one relay nodes (family operators) may also be willing to provide a high bandwidth. Besides, the abundant funds of some organization-provided family nodes may also be a reason for this phenomenon.
Figure 4. 3-month average bandwidth provided by Tor nodes, family nodes and non-family nodes from Jan. 2009 to Dec. 2011
Due to the bandwidth weighted node selection algorithm, the probability of choosing a given Tor node in a circuit is positively correlated with its bandwidth. So, the probability of building a circuit through a family node is much higher than that of non-family node. As a result, family nodes will see much more traffic than non-family nodes. The subset consisted of family nodes is actually a hot area that:
The area contains only a few nodes (less than 15% of all Tor nodes as shown inSection III-A).
The area relays much traffic in the Tor network. The traffic seen by this area is approximately in proportion to the aggregated bandwidth of family nodes shown in Figure 3, about 50% in the year 2011.
Furthermore, both the percentage of family nodes’ contribution to aggregate bandwidth of the Tor network and the superority of family nodes’ average bandwidth over nonfamily nodes increase through 2009 to 2011, thus the traffic density through this hot area of family nodes must have increased accordingly in these years.
C. Online Time
Figure 5 reveals the different distribution of continuous online time of both family nodes and non-family nodes. 2051 Tor nodes (246 family nodes and 1805 non-family nodes) are extracted from the consensus file announced at 00:00 Jan. 1, 2011. And the continuous online time of each node is obtained by checking its existence in the following consensus files announced in 2011. In order to cope with data defect, we consider a node as offline iff it does not exist in three (instead of one) or more consecutive consensus files. Once a node is taken as offline, we stop counting its continuous online time. The figure shows that, family nodes are more likely to stay online continuously for a longer time than non-family nodes. The percentage of family nodes with a short continuous online time is smaller than that of non-family nodes. As the continuous online time goes long, family nodes win non-family nodes. For example, there are 46% family nodes and 69% non-family nodes that stay online continuously for less than 1 month respectively. The percentage of family nodes with a continuous online time over 4 months is about two times as large as that of non-family nodes. Compared with non-family nodes, family nodes tend to be more stable and stay online continuously for a longer time.
Figure 5. CDF of continuous online time of Tor nodes, family nodes and non-family nodes listed in consensus announced at 00:00 Jan. 1, 2011
Figure 6. CDF of total online time of Tor nodes, family nodes and nonfamily nodes listed in consensus announced at 00:00 Jan. 1, 2011
Figure 6 reveals the different distribution of the total online time in the year 2011 of both family nodes and non-family nodes. For each of these 2501 nodes, we added one hour to its total online time once we saw it exist in a consensus file announced in 2011. Just like it is for the continuous online time, family nodes are more likely to providing more online time than non-family nodes. The percentage of family nodes with a total online time over 7000 hours (almost 80% the time of a year) is about two times as large as that of non-family nodes. It can be observed from the figure that, family nodes tend to be more stable and provide more online hours.
D. Node Types of Family Nodes
In this subsection, we study family nodes (as well as non-family nodes) from the aspect of their node types.
Table I # And Bandwidth Of Family Nodes And Non-Family Nodes
This analysis is based on the same 2501 Tor nodes as in Section III-C. The # of nodes as well as the bandwidth contribution of each type are provided in Table I. It shows that, there exist all types of family nodes that are suitable for use at entry position, middle position or exit position. Besides, family nodes contribute a lot to the bandwidth of the entire Tor network, especially the “Double” bandwidth.
To sum up, family nodes compose a small but full functional subset of Tor nodes. And they naturally coincide with some of the characteristics of “super nodes”, providing relatively stable and high-performance service to Tor users. Furthermore, family nodes can be considered as a hot area in the Tor network, relaying increasingly high-density traffic through a small number of nodes.
SECTION IV
SELECTIVE ATTACKS OVER FAMILY NODES
Li et al. [15] presents a brute-force attack, by blocking (or even denying the service of) a set of nodes in the Tor network to lower the service availability. This kind of attack has already been observed in the real world [16]. Li et al. [15] confirmed the existence of “super nodes” in the Tor network, referring to a few nodes with longer life cycles and bandwidth contributions. They claimed that “when focusing on super nodes instead of just any relays, these attacks can be more effective”. This can be considered as aselective attack, where the adversary chooses to attack a few highperformance Tor nodes and causes significant influence to the network efficiently.
As a small node subset, family nodes do provide high bandwidth and stability as shown in Section III, which naturally coincides with some of the characteristics of “super nodes”. Besides, the list of family nodes is much easier to obtain than that of the carefully selected “super nodes”. Furthermore, family nodes are more organized and under more centralized control, the number of operators behind these nodes is less than that behind “super nodes”. As a result, it's more practical for an adversary to attack family nodes than “super nodes”. This makes family nodes ideal targets of the selective attack. This section shows how selective attacks targeting only family nodes can cause disproportionate availability downgrade in the Tor network.
Availability attack over family nodes can be performed in the following two ways:
Selective Access Denying Attack. The adversary block access to each of the family nodes on its network boundary. Access denying attack prevents Tor clients in a given ISP or censorship region from getting access to these family nodes directly.
Selective Service Denying Attack. The adversary makes family nodes unavailable from all around the world. This may be done by performing DDoS attack over family nodes or even by compromising these family operators.
In order to measure the influence of these attacks, we define path failure (denoted by ) as the possibility that Tor client fails to build a circuit through the node it chooses with the path selection algorithm. Let BFN and BNFN to be the total bandwidth of family nodes and non-family nodes with node type t respectively, t = G, E, M or D. P Fp represents the probability of choosing a family node at position p, where p = Pem, pm or pex. Due to Tor's bandwidth weighted node selection algorithm (See(1)), the approximate probability of a Tor client choosing a family node at position p is given by (2). TeX Source
For the access denying attack, the adversary prevents Tor users in its ISP or censorship region from getting access to family nodes. Actually, this attack can only influence Tor clients that choose a family node at the entry position. Path failure under access denying attack is P F = P FPen. Take the data presented in Section III-D as an ex ample Wp,t is extracted from the corresponding consensus file), the path failure under this case is P F = 0,41. The adversary can cause a path failure of 41% by selectively attacking family nodes, less than 10% in the Tor network. On the other hand, by attacking 10% Tor nodes randomly, the adversary can cause a path failure of about 12%; in order to cause the same path failure as the selective attack over family nodes, the adversary has to attack 47% of Tor nodes randomly.
Service denying attack is a stronger attack than access denying attack. This attack can cause a path failure if Tor client chooses a family node in its path, no matter this family node serves as an entry node, a middle node or an exit node. Path failure under access denying attack can be determined by (3). TeX Source
Take the same data as it is in the analysis of access denying attack as an example, the path failure under this case is P F = 0,80. The adversary can cause a path failure of 80% by selectively attacking family nodes, less than 10% in the Tor network. On the other hand, by attacking 10% Tor nodes randomly, the adversary can cause a path failure of only 31%; in order to cause the same path failure as the selective attack over family nodes, the adversary has to attack 47% of Tor nodes randomly.
Selective attacks focusing on family nodes can cause serious availability downgrade efficiently with relatively low cost. The high-performance characteristic of family nodes together with the fact of small number makes them ideal attack targets of selective attack. We suggest to use the censorship-resistant component of Tor (i.e., Tor bridge nodes [4]) to compete with access denying attack. Once a Tor client gets access to the “secret” bridge node out of the given ISP or censorship region, it can then build circuits through this bridge without much influence from the adversary. Besides, a firewall orIP blacklist at Tor node may help to ease the threat proposed by service denying attack.
SECTION V
CONCLUSION AND FUTURE WORK
Based on the live Tor network data of 3 years, this paper gives an in-depth study of Tor family. Results show that, as a small subset of Tor nodes, family nodes can provide relatively stable, full functional and high-performance services to Tor users. Furthermore, family nodes naturally form a hot area in the Tor network, relaying increasingly high-density traffic through a small fraction of nodes. The low-cost selective attacks analyzed in this paper show how adversaries can cause availability disasters in the Tor network by attacking family nodes, thus giving Tor a good reason to improve its censorship-resistant and security design.
We will further our research in the future from these aspects: (i) Examining Tor family's influence on Tor performance, availability and anonymity, especially when family nodes are under attack. (ii) Looking deep into Tor family mechanism and discovering potential family misconfigurations in the Tor network (part of this work has been done in [17]). (iii) Dividing family nodes into a few node families according to their operators and investigating into these families to reveal their characteristics.