BitTorrent document


Table of Contents

1. BitTorrent class organization overview
1.1. Network connection handling
1.2. Data transmissions and rate limiting
1.3. Piece selection
1.4. Storage handling
2. Detailed class information
2.1. The Storage class
2.2. The StorageWrapper Class
2.2.1. The new_request(piece#) function
2.2.2. The piece_came_in(piece#, offset, slicedata) function
2.2.3. Data members places,rplaces
3. Debug BitTorrent code
4. Summary of BT code changes to enable share-downloading
4.1. Major changes
4.2. Other notes
4.3. Some initial experiment results

Last updated: 07/13/2006.

Chapter 1.  BitTorrent class organization overview

Here we focus on the command-line interface provided by BT client only (btdownloadheadless.py).

In btdownloadheadless.py, class HeadlessDisplayer is used to print some run-time statistics to standard output to notify user of the progress and class DL is used to drive the downloading process.

Class DL calls several other classes in download.py through its member function run to join the torrent and start downloading/uploading. These classes can be divided into several categories.

1.1.  Network connection handling

A BT client needs to initiate connections to the tracker and its peers. The communication between BT clients and tracker is through HTTP and the duration of connection is relatively short. When a BT client finishes downloading and before it is closed by an end-user, it serves as a seeder and uses upload rate to decide which peers to upload more instead.

Class RawServer (defined in RawServer.py) is one of the most important classes. It multiplexes IOs (for both server and client sockets) and is responsible for clearing timed-out sockets. It avoids using multiple threads or multiple processes and thus is much more efficient. Class SingleSocket is a simple wrapper class around the Python socket library and is also defined in RawServer.py. It is used to handle data in the buffer and send data through the socket. In BT client, other socket-related classes depend on the services provided by both RawServer and SingleSocket to focus on data transmissions and receptions without being bogged down to network connection handling details.

1.2.  Data transmissions and rate limiting

With the services provided by RawServer, several classes implement the BT protocol itself, handling peer-to-peer and peer-to-tracker information exchange and rate limiting. These classes include

  • Connection

    Defined in Connecter.py and is used to handle BT protocol handshake between peers. Its member function data_came_in is called each time the socket receives data from other peers. Function _read_message is used to analyze each field in the received messages. A Connection object is created for each peer and tracker.

  • Rerequester

    Defined in Rerequester.py and is used to handle BT protocol handshake between peer and tracker, e.g., request for peer list, updating its own status and handling peer list.

  • Upload

    Defined in Uploader.py and is responsible for uploading pieces/blocks to peers.

  • SingleDownload

    Defined in Downloader.py and is responsible for downloading from peers. A SingleDownload object is also created for each connecting peer. It maintains a list of all the active requests it sends out, keeps track of download rate from peers and checks if remote peer has any pieces local client is interested in. Its member functions respond to different messages received from peers and then act accordingly.

  • Choker

    Defined in Choker.py and is used to decide which peers to choke or unchoke. The ``TFT'' algorithm is implemented here.

1.3.  Piece selection

Though this is related to data transmissions, we list it separately because it is the part upon which we make our changes. There are several related classes here:

  • Bitfield

    Defined in bitfield.py. This is a helper class that provides easy access to bit strings that identify which pieces a peer already has and which pieces it needs to get.

  • PiecePicker

    Defined in PiecePicker.py. This implements the piece selection algorithms, e.g., the rarest first algorithm and the random first algorithm. It also keeps record of which blocks in a specific piece the client has and needs to request.

    In fact, we can modify the class such that a BT client only requests the pieces that we need, for example, the first half of the pieces for a file.

1.4.  Storage handling

The Storage class (defined in Storage.py) provides lower-level functions to open, read and write files and pieces. The StorageWrapper class (defined in StorageWrapper.py) provides higher-level functions to read and write BT pieces. For example, it is possible to read or write a piece with specified index through StorageWrapper.

Besides, StorageWrapper also ensures that pieces are assembled in order when it writes the file even though these pieces can be received out of order.

Chapter 2.  Detailed class information

2.1.  The Storage class

In one torrent file, it may contain multiple files for ease of distribution. For our purpose, we just focus on the case of one file to download in one torrent file. Still, BT uses the FilePool class to maintain multiple file handles.

(What is a file handle? It is a pointer-like structure through which you can open/read/write/close a file. Remember the FILE* in C programming? That is just like what a file handle is.)

As we have known, Storage is a simple wrapper around Python's file functions for reading/writing at specified places. This is important. Why? Because a file to download is divided into several pieces and each piece is further divided into slices as one unit of BT transmissions. These pieces/slices can be received out of order, so it is important to provide some higher level functions to read from or write to specific places.

This is what Storage class provides: It contains a read and write function. With the read function, you can specify the position and the amount of data (in bytes) to read. With the write function, you can specify the position to write to with the data (string). Here a string can contain any characters and does not need to be '\0' terminated.

Now we can look at a simple example to test our understanding of the Storage class. It contains the minimum amount of code to make it work.

from BitTorrent.Storage import Storage,FilePool

# This is a simple test for the functions of the Storage and FilePool classes.
# We need to specify how many files to open (maintained by FilePool) and then
# pass it to the constructor of Storage
# Then we also need to tell Storage the filenames and their sizes (for the time 
# being, we consider only one filename and one size.
# Then we can use the read/write functions to write to the file and finally 
# close it.

if __name__ == '__main__': 
	config = {'enable_bad_libc_workaround': 0}
	myfiles = ['hello.txt']
	sizes = [20]
	filepool = FilePool(1)
	s = Storage(config, filepool, zip(myfiles, sizes), False)
	str = '0123456789'
	# Write from location 10
	s.write(10, str)
	# Reverse the string, now it should be '9876543210'
	str = str[::-1]
	s.write(0, str)
	s.close
	# Now the file should contain '98765432100123456789'
	

The file is also available as teststorage.py that you can test on your machine.

2.2. The StorageWrapper Class

The StorageWrapper class may seem intimidating on first reading. It IS. To initiate an object of this class, we have to define a lot of dummy functions to avoid some compilation and/or run-time errors.

Instead of using real functions from BT code, we just provide some stub functions to satisfy the constructor.

2.2.1.  The new_request(piece#) function

The new_request is to called by other classes (probably Downloader) to request new piece given by index. Actually it returns the slice of the piece that is missing. For example, when we first call this function manually, it should return offset 0 and the length of a slice. When we call it the second time, it should return offset and length that both equal the length of a slice.

How could this happen that each time we call this function we get different results? The reason is that when we first call new_request for a piece, that piece is not requested yet. So the function will automatically populate a list which contains pairs of offset and length. The list is actually the specification for the list of slices for this piece. It is saved in inactive_requests[index] while inactive_requests is a list of lists.

Then the function will remove and return the first element of list inactive_requests[index]. Later whenever it is called, it returns the slice with the minimum offset.

2.2.2.  The piece_came_in(piece#, offset, slicedata) function

Whenever we get a slice from one of our peers (offset and the actual data), then this function will save the slice and do some book-keeping. Suppose one piece contains two slices and we just get one slice, then if we call do_I_have(piece#), then it should return False because we haven't got all the slices yet. Once we get all the slices for a certain piece, then do_I_have should return True.

2.2.3.  Data members places,rplaces

They are lists and mirror each other. places provides mappings from piece number to the allocated storage space and rplaces provides the reverse mappings. For example, suppose we first get piece 9 (starting from 0) and we allocate the first available storage space. So both places[9]==0 and rplaces[0]==9 are true. During the process, some pieces may be swapped and finally when we actually write the file, we need to put them in order.

The example file teststoragewrapper.py is shown below.

import threading
from BitTorrent.Storage import Storage,FilePool
from BitTorrent.StorageWrapper import StorageWrapper

def finished():
	print "Finished\n"

def statusfunc(activity = None, fractionDone = 0):
	print "calling statusfunc\n"

def data_flunked(amount, index):
	print "calling data_flunked\n"

def errorfunc(level, text):
	print "calling error func\n"

# check_hashes need to be set to 1, so that it will try to load fast resume
# file. Because we don't have any resume file, so rplaces will be None, and
# then they will be initialized with UNALLOCATED (-2)
# If we don't do so, run-time error will happen.
if __name__ == '__main__': 
	config = {'enable_bad_libc_workaround': 0, 'check_hashes': 1,
'download_slice_size': 32, 'no_validate_piece': 1}
	myfiles = ['hello.txt']
	sizes = [1024]
	filepool = FilePool(1)
	storage = Storage(config, filepool, zip(myfiles, sizes), False)
	hashes = [1] * 16 
	doneflag = threading.Event()
	piecesize = 1024/16
	wrapper = StorageWrapper(storage, config, hashes, piecesize, finished, statusfunc, 
		doneflag, data_flunked, 'infohash', errorfunc, None)
# Now suppose the pieces come in different orders, say 15, 13, ..., 1, 14, ..., 0
# First need to request it
# What if I don't request it?
	index = 15
	(begin, length) = wrapper.new_request(index)
	assert begin==0 and length == 32
	(begin, length) = wrapper.new_request(index)
	assert begin==32 and length == 32
	strdata = '15' * 16
	result = wrapper.piece_came_in(index, 0, strdata)
# This should be True, meaning we validate and save the slice successfully 
	assert result==True
	result = wrapper.piece_came_in(index, 32, strdata)
	assert result==True
	assert wrapper.do_I_have(index)==True
	assert wrapper.do_I_have(index-1)==False
# I should request first
	index=14
	strdata = str(index) * 16
	(begin, length) = wrapper.new_request(index)
	(begin, length) = wrapper.new_request(index)
	result = wrapper.piece_came_in(index, 0, strdata)
	assert result==True
	assert wrapper.do_I_have(index)==False
	result = wrapper.piece_came_in(index, 32, strdata)
	assert result==True
	assert wrapper.do_I_have(index)
# Now [8-14), i.e., 8, 9, 10, ..., 13
	for i in range(8,14):
		(begin, length) = wrapper.new_request(i)
		(begin, length) = wrapper.new_request(i)
		strdata = ("%02d" %(i)) * 16
		result = wrapper.piece_came_in(i, 0, strdata)
		result = wrapper.piece_came_in(i, 32, strdata)
		assert wrapper.do_I_have(i)
	print wrapper.places
	print wrapper.rplaces
# Please check hello.txt to see the file's content. The text should be in order.
	

Please note this file has been changed to test some additional functions.

Chapter 3.  Debug BitTorrent code

The problem with debugging BT code is that when we use interactive debugger, the socket will be idle for too long (from machine's point of view) and then our peer will close the connection very soon. So this is very challenging. Currently the most straightforward way is to print something at the point we are interested and then do some postmortem analysis after the program finishes.

However, because of some strange intricacies, it is impossible to redirect BT output to a log file. So we cannot just use "print" statement. Instead, we have to write a function to write the interested messages to a log file. For this purpose, we define a helper function in rangedpiecepicker.py called logMessage to log the specified message. In addition, we also need the "config" dictionary to know which log file to write because we don't want to hard-code the log file's name in the code. The function's definition is:

from time import time, strftime

def logMessage(config, msg): 
	t = strftime('[%H:%M:%S] ')
	log = t + msg + "\n"
	if config.has_key('logfile') and len(config['logfile'])>0: 
		f = open(config['logfile'],'a')
		if f:
			f.write(log)
			f.close()
		else:
			print log
	

To use it, you just need to include the following line in the file that you want to use it, e.g., in the StorageWrapper.py file, we added the following line:


from BitTorrent.rangedpiecepicker import logMessage,getOffsetNPiece

	

We will describe the getOffsetNPiece function later.

With this function, you can log interesting information at interested places. Remember, you need to get the "config" directory through some ways.

For example, in the Downloader.py file, it contains definition for the class SingleDownload. For the SingleDownload class, it has a function _request_more. Because this class does not contain the "config" dictionary, so we need to get its member variable downloader which contains the "config" dictionary to do the trick. So the line that calls the function is:

 
logMessage(self.downloader.config, "We show interest in piece " + str(i))
	

That is the place we show our interest in specific piece.

Chapter 4.  Summary of BT code changes to enable share-downloading

4.1.  Major changes

Changes made are made to three files: StorageWrapper.py, defaultargs.py and Connecter.py.

Contents of places[] and rplaces[] arrays often change (some pieces may be reordered), so they are unreliable. An easy way is that each process writes a finished piece to its parent directory and name it like: ../piece1, ../piece2. Other processes can use os.access and os.stat functions to see whether a piece has been downloaded.

Two main functions are added to StorageWrapper.py. One is write_piece. This is called whenever a process finishes downloading one piece and then writes a copy of the piece to its parent directory: ../piece<piece#>.

The other is check_new_piece. This is called whenever a process downloads one piece and is ready to send HAVE message for that piece. At this time, the process can check its parent directory to see whether other pieces (that it is not responsible for) are available. If so, add them to have[] and then send to each peer. This function is called in the _got_message function of Connection in Connecter.py.

The place and name to save another copy of these pieces are specified by the "piece_filename" argument in defaults.py.

Function get_piece is also modified. When a piece that we are not responsible is requested (because we sent HAVE message earlier), we should try to read our parent directory and see if piece<piece#> is available and then read a block.

Such changes also apply to N>2 processes.

4.2.  Other notes

get_have_list in StorageWrapper.py doesn't need to be modified. This is only called when a connection is initially set up between two peers. So return self.have.tostring() is enough.

Since we don't use progress file and we also don't read from the other process' main file directly (we only read individual piece file), so files Storage.py, download.py and Downloader.py don't need any change.

4.3.  Some initial experiment results

The modified program's logic is verified. However, some experiments show that the download speed of the modified clients does not increase even though each modified client can upload pieces to other peers (it itself doesn't download those pieces). So the slowdown may be due to the piece selection algorithm used by BT (rarest first?).