index.html
Demetrios Zeinalipour-Yazti
Department of Computer Science
University of California
Riverside, CA 92507, USA
csyiazti@cs.ucr.edu
Peer-to-Peer (P2P) file-sharing systems such as Gnutella, Morpheus and Freenet have recently attracted a lot of interest from the internet community because they realized a distributed infrastructure for sharing files. Such systems have shifted the Web's Client-Server model paradigm into a Client-Client model. The tremendous success of such systems has proven that purely distributed search systems are feasible and that they may change the way we interact on the Internet. Beside the several advantages that have been uncovered by P2P systems, such as robustness, scalability and high fault tolerance various other questions and issues arise in the context of Security. Many P2P protocols are bundled along with an adequate amount of security mechanisms but are proprietary which makes their analysis difficult. The Gnutella Protocol on the other hand is an open protocol, which doesn't highlight security in the sake of its simplicity. Most security weaknesses of the Gnutella Protocol could be avoided if the protocol was taking into account that Peers may misbehave.
In this paper we provide an overview of the Gnutella Protocol Specification, describe several of its weaknesses and show how they can be turned into Distributed Denial of Service Attacks, User's Privacy Violation and IP Harvesting. We present the weaknesses with experimental attacks that we have performed on the Gnutella Network. We finally evaluate how these attacks could be avoided and suggest in some cases improvements on the protocol.
The Gnutella community is increasing day by day at extremely high rates. Clip2 [19] shows that the typical number of peers found in the Gnutella [12] network during a weekday is peers sharing files. By the time of this writing another huge network of peers, namely Morpheus [18], which was previously operating over the Kazaa [22] protocol, joined the Gnutella Network. Cnet.com [25] has logged downloads of the popular Morpheus Gnutella beta client in a period of 12 days making it the most popular download on the internet.
Beside the tremendous success that these systems meet, they are also facing many security threats. Although we won't consider in this paper attacks or flaws which are not directly related to the Gnutella protocol we mention below a few other "side-effects" of using P2P clients. Recently the Morpheus network suspended its operation after Kazaa, which used to be Morpheus protocol vendor, disabled the protocol from machines that were using the Morpheus Client [12]. After this attack, Morpheus immediately switched its protocol to Gnutella to prevent similar attacks in the future. Spyware programs are also a huge problem in the P2P community. A spyware is a program that is usually distributed along with the P2P client and which sends out personal user information. Cydoor [23] is an example of such a program which is distributed along with the some of the most popular P2P clients. Viruses of course are another potential thread for the P2P community. P2P systems are just another medium over which viruses can spread efficiently, although it is estimated [13] that they are not as dangerous as viruses which are spread over the email. Most Antivirus vendors such as Symantec have already released protection updates for the notorious Trojan horses VBS.Gnutella and W32.Gnuman.Worm. Finally, poorly written P2P Clients are another problem in these networks, since they expose their users to all the security flaws of the particular P2P client. For example, the actual transfer of files in Gnutella is done with the HTTP protocol. This means that each Gnutella Client instantiates a "mini" web-server which is capable of serving other Gnutella Clients after their query was performed over the Gnutella network. These "mini" web-servers are sometimes poorly written making them vulnerable to attacks.
Although all the pre-mentioned threads affect users that use P2P clients they are not a result of the Gnutella Protocol itself. The Gnutella Protocol itself though presents another set of security problems which can jeopardize the normal operation of this community. Security was never a design issues for Gnutella. Gnutella's security is sacrificed in the sake of simplicity and performance. Messages sent across to other peers are sent in plain text and are readable and modifiable by anyone. This results in serious problems such as spamming, where a peer pretends to have "anything" we ask for. We will exploit this weakness in section and turn it into an active Distributed Denial of Service Attack.
Encryption is also orthogonal to the way the Gnutella Protocol operates. Since each peer must be able to decrypt the sent message in a reasonably small time it would make no sense to embed encryption in the protocol. This means that we would probably never be capable of protecting a Gnutella user from some malicious user which tries to spy someone else's activity, violating personal privacy in this way.
We believe that most of the threats could be eliminated if the Gnutella protocol was taking into account security. By this we mean that the Gnutella Protocol should enforce the implementation of several features, by stating them clearly in the protocol specification, instead of doing "recommendations". A different design of the protocol could enforce critical, to our opinion, features that this particular P2P protocol should meet.
The focus of our work is to identify security weaknesses of the protocol and to exploit them by setting up proof of concept attacks. The remaining of this paper is organized as follows: Section 2 presents Related Work in this area, Section 3 an overview of the Gnutella Protocol for readers who are not familiar with the Protocol details. Section 4 presents a taxonomy of the Gnutella weaknesses and evaluates how these attacks could be avoided by suggesting in some cases improvements on the protocol. Finally we provide our conclusions and make a reference to future work in Section 5.
Although Gnutella has attracted a lot of interest from both the industry and the academic environment regarding its communication and data model [4][5][6], it has not received the adequate interest regarding its security model. To our knowledge, no previous work has been published regarding the security aspects of this Protocol. Some presentations and newsgroup postings are interesting and are hence described in brief below.
S. Bellovin analyzes and compares in his 9th Usenix Security symposium presentation [1], the Napster and Gnutella protocols. His work is mainly focusing on issues such as possible new attacks, traceability of behavior, and privacy but does not provides an insight of how dangerous these issues may become.
Parashar et al. are evaluating in [3] security mechanisms in Peer-to-Peer Applications. This work is not directly related to the Gnutella Protocol but it rather defines a general framework for deploying security mechanisms in P2P networks. They start out by categorizing P2P applications into three categories a) Distributed File Sharing Applications, such as Gnutella, b) Real-time Communications applications, such as Instant Messengers, and c) Distributed Computing applications such as the SETI@Home project [9]. They then discuss how three different Security Enabling Technologies, namely Cryptography, Smart Cards and Steganography can be used to make P2P more secure.
Endeavors Technology's Magi Enterprise software [21], is one of the first highly secure file sharing and communication capable system, which can be as secure as software used across banks and other financial institutions. The company has implemented a full-featured X.509 Public Key Infrastructure (PKI) over a Secure Sockets Layer (SSL) network backbone, which in combination provides the cryptographic standard for Internet e-commerce. The use of X.509 PKI authentication allows security certificates from Endeavors, or from any other recognized X.509 certificate authority, to be used to establish the true identity of any Magi-enabled peer device when it comes on-line and the use of SSL point-to-point security encryption enables each pair of peers that communicate with each other to have a unique key for that pairing. The problem with this technology is that it is proprietary and discovery of security flaws can only be based on reverse engineering techniques, such as the Napster case [14]. Some questions regarding the scalability of the system also arise.
Seth McGann describes in a posting to the Security Focus Online, BugTraq Archive [8], how somebody can spread Self Replicating Servents, how to perform Man in the Middle Attacks and finally how to implement a Gnutella Port Scanner.
Gnutella's distributed search protocol [24] allows a set of peers, servents or clients, to perform filename searches over other clients without the need of an intermediate Index Server. The Gnutella network topology is a pure Ad-Hoc topology where Clients may join or leave the network at any time without affecting the rest topology in any sense. The Protocol hence, is designed in a highly fault-tolerant fashion with a quite big overhead [4] of synchronization messages traveling through its network. All searches are performed over the Gnutella network while all file downloads are done offline. In this way every servent that needs to serve a file launches a mini HTTP 1.1 web server and communicates with the interested servents with HTTP commands.
The core of the protocol consists of a set of descriptors which are used for communication between servents and also sets rules for the inter-servents descriptors exchange. These descriptors are Ping, Pong, Query, QueryHit and Push. Ping messages are sent by a servent that needs to discover hosts that are currently active on the network. Pong messages on the other hand are responses to Ping requests every time a host wants to come in communication with the peer that requests to join the network (Figure [1], step 3). Although the protocol doesn't set any bound on the amount of incoming or outcoming connections from a particular host, most Gnutella clients come with a default value for these parameters, usually , but which are adjustable by the user. A client joining the Gnutella network for the first time may of course not have any clue regarding the current topology or his neighboring peers making the connection to the network impossible. For this reason most client vendors have setup Host-Caches Servers which serve clients with IP addresses of servents currently connected to the network (step 1). Another mechanism which is widely deployed, but is not part of the protocol, is the use of local caches of IP addresses from previous connections. In this way a servent doesn't need to connect to an IP acquisition server (i.e. a host-cache) but can rather try to establish a connection with one of its past peers.
Right after a peer has obtained a valid IP address and socket port of another servent it may perform several queries by sending Query descriptors and receive asynchronously results in QueryHit descriptors (step 4). Step 6 shows how a servent may download a file from another servent . This procedure is done offline with the HTTP protocol. If the servent is firewalled, then may request from , with a Push descriptor (step 5), to "push" the file.
In this section we will present several different kind of attacks that we have implemented using the Jtella API [10]. Jtella is an open source Gnutella Application Protocol Interface written in Java, which allows the creation of Java based Gnutella Servants with custom behavior (i.e. not according to the Gnutella Specification). We have to mention that most of the commercial Gnutella Clients are vulnerable to our attacks, which can unsettle the regular operation of a big part of the Gnutella network. This happens since the Gnutella security relies to much on the behavior of the participating clients instead of build-in mechanisms which could prevent such attacks.
Spamming is well known in the Email World. We all receive unsolicited emails from time to time, with advertisements or sometimes even viruses. Although some user may easily trash a spammed email message; and there would be no other consequence for him, the same wouldn't happen with a spammed query result in the Gnutella Network. If we accept to download a spammed query result we might become an active part of a Distributed Denial of Service (DDoS) Attack against some Internet Host.
Gnutella has no provisions to thwart Distributed Denial of Service Attacks. DDoS attacks are probably the first thing that comes in somebody's mind when we think of a huge number of "uncontrolled" participating hosts. Unfortunately no provision has been taken to avoid such attacks, neither in the protocol nor in the majority of the Gnutella clients. A DDoS attack can easily be achieved in Gnutella since a may "lie" by answering positively to any query it receives and urge to download a resource from , which is the host under attack. A Denial of service threat, according to E. Amoroso [2], arises whenever access to some computer system resource is intentionally blocked as a result of malicious action taken by another user. Although this definition is not exactly the case in Gnutella, since a which is participating in the attack is not intentionally creating the load on , but it simply tries to download the file it was said to be there, this kind of attack is still considered a DoS. Such attacks are also extremely difficult to identify, on behalf since requests are launched from many different IPs and domains.
For the purpose of the experiment we have configured an Apache Webserver [15] on a Pentium III, 900MHz Windows host with 100Mb LAN connectivity. The Web-server will be the target of the attack performed by . We have also modified the default LogFormat attribute in the configuration file of apache, conf/http.conf, in such a way that it will log the User-Agent of the requesting Peer. This will help us to list the Gnutella Clients that are actually vulnerable to this attack. A detailed overview of the topology is shown in Figure [2]. As you may see and are both performing a Query with different keywords in step 1. is simply forwarding the message to which is the malicious host. will reply to both messages, in step 2 with a QueryHit response urging them to download the file from . We have to recall that QueryHit message is very "attractive"; since it indicates that the file is located on a server with a LAN connection. It moreover tries to make the filename as much similar, to the initial query, as possible. In our experiment we have chosen to reply to all messages with a "dressed" filename that consists of the initial keywords along with a random extension of popular media files, such as ("mp3", "mpeg", "asf", "wma"). We have also chosen a random filesize which was in the range in order to make the results as attractive as possible. Gnutella clients such as LimeWire would rank our result as a top quality result, since the filename contains all the query keywords.
The generated load on could be so high that would be unable to accept additional connections or even bring it down in some cases. In our case, we early realized that our Apache Server was not able to respond to the generated load. That happens since our fake responses may propagate to many , which subsequently may try to download the file from . must allocate some local resources in order to service the fake request. In our experiment, as we will show later on, we were able to respond to Query requests in an 8 hour period. Of course all replies may not reach the designated users since many messages can be discarded, due to the adhoc nature of the network. Moreover all remaining users may not rush to download the file is pretending to be on . Although not related to the attack. It is also interesting to mention an approximation of the overhead generated by each QueryHit. If we assume that each is connecting to averagely 5 other peers (which is usually much higher) and will forward a message only in a depth of 7 (i.e. TTL=7), which is also typical for a Gnutella client and assuming that each peer is using message flooding, which means that fake responses will be sent to all of its peers, we will averagely reach other . What we actually want to see in this experiment is how dangerous this attack can turn.
Another important aspect of this attack is that most of the will insist on trying to download the file from on a virtually endless basis. We have observed peers trying to download the file from the Web-Server (after was shutdown), well after a period of 20 hours. This happens because most Gnutella clients are designed in such a way that when a download fails, the client will automatically try to download the file again after a given interval. We have observed that this interval is sometimes very small (e.g. 2 seconds) which subsequently means that the cheated will automatically try to re-establish a connection with after 2 seconds. That means that the attack will be re-launched automatically from the participating cheated hosts on a clock-wise basis or until the user of chooses to manually cancel the download. Most of the times, this is not the case because many users choose to download a file and then may leave the system running for several hours or until their download is satisfied.
We finally mention that some unknown sources are already taking advantage of this security flaw, with the difference that they don't intend to create a DDoS attack but they want to create some extra traffic on some web sites for commercial purposes. More specifically a malicious replied to any Query with a QueryHit which prompts a Gnutella users to download an HTML page. As soon as the user downloads and launches the specific page, he is automatically redirected to a commercial web site.
Table [1] presents some of the Gnutella Clients which are vulnerable to the DDoS through Spamming Attack. We can see that the most of the commercial Clients are affected by this attack. We need to mention that most of the requests made to our Apache Server had no User-Agent field in the HTTP request, which indicates that many other unlisted Clients are also affected by this attack
J.Riiter, one of the founding developers of Napster [20] presents in [11] an interesting set of mathematical equations which aim to calculate the reachability, capacity, and bandwidth throughput of the Gnutella Network. His assumptions are too broad, which makes it difficult to validate the correctness of his equations. We consequently performed an experiment with 30 misbehaving peers {, } distributed on 3 hosts, in order to get a better understanding. Each host launched 10 , each of which connected to a Gnutella Host-Cache Server to obtain a list of peers currently connected to the Gnutella Network. After a period of 2, 4, 6, 8 hours we connected to the Gnutella Network with a commercial Gnutella client, namely Morpheus, and performed several queries to see if we can actually receive an answer from one of the . In all cases we were able to receive a spammed message from one or more of the 30 . After shutting down the 30 we observed that the misbehaving peers answered to Query requests in the 8 hour period. In most occasions the spammed message indicated that the resource is available 7 hops away, which is typically the maximum TTL in Gnutella. This experiment proves that spammed messages can affect a large number of Gnutella users and spread confusion all over the Gnutella Network.
In order to gather a MB log file which consists of logged requests, we restarted the firewall several times. After the rate increased beyond we were unfortunately not able to bring the firewall up since it collapsed on initialization. Many system crashes also occurred so we were not able to gather concrete results. We believe that if we were using a different platform and a different set of tools we would be able to gather better results. Although these problems, we extracted some useful statistics which are not affected by the fact that the requests have not been taken contiguously.
|
Table [2] provides a set of statistics gathered from the ZoneAlarm Firewall Log File. In order to extract and manipulate the data we wrote a custom ZoneAlarm Log Analyzer. Although the system was running for 1 hour it basically collected data only for 19 minutes due to the system crashes that were described before. As we can see from the table, the Unique number of IP addresses that did a request from our Webserver was , which implies that a host is requesting a particular document several times, since the total number of Logged Requests is . This validates our observation that a Gnutella Client will insist on downloading a particular spammed QueryHit. Supportive to this fact is also the Maximum number of requests done by 1 IP address; which is and which again shows that a cheated Gnutella client will return after some interval to reclaim the file. The table also shows that the generated load is considerably high ( requests/second), taking into account that this number could be much larger if we were able to log all the requests.
Since is a misbehaving peer it will respond with a Pong which contains the IP and the PORT of another host which is the host under attack. would subsequently think that it established a connection with , cache address instead of and forward all Query messages to . In figure [6] we can see analytically the 3 steps that were described above.
As we mention in the beginning of this subsection, this attack is not comparable to the DDoS through Spamming because this attack will last only for a short period. This happens because will after a short interval re-send a Ping message to all the hosts in its cache in order to re-discover the topology. Since is not really a Gnutella client and is not able to reply with a Pong message to request, will simply remove from its cache and the attack will terminate.
A typical internet user has an IP address which is never broadcast and is not generally known to anyone else other than the Internet server on which this user connects. The nature of P2P networks though requires a user to advertise his address to other Peers in order to achieve connectivity with them. It is well known that malicious hackers are constantly scanning the Internet for IP addresses and that they try to break into these systems if the systems are not secured adequately. These hackers usually ping random IPs with the hope that some IP might be up and running a specific service, such as a Webserver, or are targeting high-visibility Internet Servers that have a well known address. The Gnutella network might facilitate the activities of these persons since harvesting IP addresses in the Gnutella network is very efficient. Another interesting issue is that users of the Gnutella Protocol are usually sharing some common behavior, which could be translated in terms of Operating System they run, servers that are running on their hosts (i.e. their Gnutella Client along with its mini Web Server) as well as their Security Ignorance. A Malicious hacker which identifies a security flaw in one or more of the poorly written Gnutella Clients could harvest for IP addresses and then launch a massive attack against the Gnutella users. This kind of attack is obviously easier for a hacker than trying to hack into a high-visibility Internet Server which is usually shielded with the latest protective software, such as firewalls. The IP addresses that somebody can discover are usually not very helpful after some time since most users of such systems are either using dial-up connections; hence their IP address changes over time, or are assigned dynamic addresses with protocols such as DHCP. The problem is bigger though for users which have static IP address and might run the Gnutella Clients around the clock. An example of such users, are the users of the ".edu" domain, which are estimated [26] to contribute with of the Gnutella volume.
In order to demonstrate how many IP addresses somebody can actually obtain from the Gnutella Network we have written an IP-Harvesting Engine. Given that each host can arbitrary disconnect from the network we need to be able to discover their IP addresses in a short time. For the purpose of the experiment, we have implemented, again with the Jtella API, a multi-threaded IP-Harvesting Engine. The engine connects to a set of pre-specified Gnutella Host's Caches, obtains an initial set of peers which are active in the network and tries to connect with them collecting each time their IP address. In figure [7], we presents how we have discovered unique IP address in a period of 5 minutes. IP addresses are tailored along with the port numbers of the corresponding Gnutella clients. This information could facilitate malicious hackers which are seeking for "fresh" IP addresses.
Upon receiving the Push descriptor, should try to establish a TCP/IP connection with on the given IP and PORT address and after exchanging another pair of descriptors, which are not significant in our context, transmit file's contents. The problem that arises with the Push message is that may inject viruses to at any time since blindly accepts whatever is sent by . The only thing that needs to be done from behalf is to initially "lie" to any Query it receives by replying with a QueryHit that contains a file that doesn't exist. We have not validated the hazard of this attack but we need to mention that some Gnutella Clients, such as LimeWire, will filter out "dangerous files" (e.g. files having a *.vbs or *.exe extension).
1.0
This document was generated using the LaTeX2HTML translator Version 99.2beta8 (1.42)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 gnutella.tex
The translation was initiated by root on 2002-01-24