Difference between revisions of "FAQ eD2k-Kademlia"
|  (yes, you are right. sometime ago it was server+4, but since servers use diff ports because of isp blockings, it is client+3 now. BIG thanks for the notice) |  (doubled words the the) | ||
| Line 47: | Line 47: | ||
| A [[MD4 hash]] is a unique value each chunk is given and is the result of a mathematical operation on every single bit on the chunk. This means that modifying a single bit in a chunk would result in a completely different hash. That means that the [[client]] needs to verify the integrity of each part of a file as it is downloaded.   | A [[MD4 hash]] is a unique value each chunk is given and is the result of a mathematical operation on every single bit on the chunk. This means that modifying a single bit in a chunk would result in a completely different hash. That means that the [[client]] needs to verify the integrity of each part of a file as it is downloaded.   | ||
| − | Not only are the chunks hashed but also, in order to get a file-hash, all chunks's hashes are concatenated one after the other in their file order (that is: chunk1's_hash+chunk2's_hash+chunk3's_hash+...) and the resulting string is hashed. This way, each file on the ED2K network has a unique identifier. The file hash isn't taken from hashing the whole file, but from hashing the value of  | + | Not only are the chunks hashed but also, in order to get a file-hash, all chunks's hashes are concatenated one after the other in their file order (that is: chunk1's_hash+chunk2's_hash+chunk3's_hash+...) and the resulting string is hashed. This way, each file on the ED2K network has a unique identifier. The file hash isn't taken from hashing the whole file, but from hashing the value of the chunk's hashes. | 
| In reality, you need both the hash of a file and its size. These pieces of information is embedded in the ED2k URLs found in many places. | In reality, you need both the hash of a file and its size. These pieces of information is embedded in the ED2k URLs found in many places. | ||
Revision as of 20:29, 17 January 2006
Contents
- 1 F.A.Q. on eD2k-Kademlia
- 2 What is ED2K?
- 3 What is Kademlia?
- 4 Is Kademlia the same as Overnet?
- 5 What is a chunk?
- 6 What is a hash?
- 7 Why after searching, some files which are the same appear as a different file in the results, although they even have the same name?
- 8 What is LowID and HighID?
- 9 Which ports do I have to configure in a firewall or router to run aMule?
- 10 What does each port do?
- 11 Are there any limitations on the ED2K network?
- 12 Are there any limitations on the Kademlia network?
- 13 In search window, what filter stands for which filetype?
- 14 What is a source?
- 15 What is all this talk about credits, ratings and scoring about?
- 16 What is a slot?
F.A.Q. on eD2k-Kademlia
English | Español | Italiano | Deutsche | Français | Nederlands
What is ED2K?
ED2K is a protocol originally used by the P2P (Peer-to-Peer) client eDonkey2000, which is where the name comes from. It is a server-client based protocol, with the ability to exchange sources between clients.
The ED2K network is server based like many other P2P networks such as Kazaa (Kazaa is server based, but hides the server connection from the user), which means that the first thing you do when you run aMule is to connect to a server (either manually or automatically).
Once successfully connected to a server, the client can search, either locally (the connected server) or globally (all servers), for any file and the servers asked will provide the client with a list of all the files which match search parameters.
If the user starts a download, the client will then ask the server for sources, which the server will return in the form of IP addresses for the clients that have told the server that they have the specific file.
Then the remote client will begin to upload a whole chunk to your client as soon as you are the first in the queue, and when the chunk has been completly sent, you will be taken back to its upload queue. This way different chunks get spread around the ED2K network, so that, although no-one may have at a same given moment the complete file, it may be completed by downloading the different chunks from different people (it is well known that users tend to stop sharing a file once it has been completed).
Note that clients upload only one chunk at a time to another client. Even if a client is in the upload queue of two different files of a same user and gets to the top of both, that user will only upload one of the files to that client (the other upload, depending on the ED2K application the client uses, will probably remain as a maximum priority upload, but will not begin until the other chunk has been successfully uploaded).
If both users have HighID (see What is LowID and HighID?) the transfer will be done directly from client to client (Peer-to-Peer), but if one of the clients have LowID, the connection will be established through the server, since LowID cannot accept incoming connections. As a result, two LowID clients cannot connect to each other.
What is Kademlia?
Kademlia is a natural evolution of the ED2K network. Kademlia is the future. See Are there any limitations on the ED2K network? for more information on why Kademlia is necessary.
Since Kademlia is a decentralized network, it removes the bottleneck that was previously caused by the need for servers (though Lugdunum has done great work in reducing this bottleneck). Now, instead of connecting to a server, you just connect to a client (with a known IP-address and port), which supports the network Kademlia. This is called the Boot Strapping.
Once connected, depending on your ability to accept incoming connections, you are given either "open" or "firewalled" status, which is similar to the HighID and LowID of the ED2K network. Then you are given an ID.
When searching, each client acts as a small server and is given responsibility for certain keywords or sources. This adds to the complexity of finding sources, as you no longer have a central server to ask, but instead will have to propagate the query through the network.
Kademlia is supported in aMule since the 2.1.0 version.
Is Kademlia the same as Overnet?
Short and clear: No. Overnet is the natural serverless evolution of the eDonkey software, while Kademlia is the natural serverless evolution of *Mule clients. Both are based on the original Kademlia algorithm but have been applied in different ways and therefore are incompatible. So, it's the same philosophy, but different rules. To learn about how Overnet works, refer to http://www.edonkey2000.com/documentation/how_on.html but, keep in mind, Overnet's development is closed until it reaches version 1.0, while Kademlia's development is completly open from the very beginning.
What is a chunk?
In the ED2K protocol, to avoid sharing corrupt files, each file is divided into various parts, which are known as chunks, and then each chunk is hashed (read below to know what a hash is). Each chunk is 9.28MB in size, so a 15MB file will be divided into two chunks (9.28MB + 5.72MB), a 315KB file will be a single chunk and a 100MB file will be divided into 11 chunks (10x9.28MB + 7.2MB).
What is a hash?
Dividing each file into chunks (see What is a chunk?) will avoid the problem of downloading a whole corrupted file since only the corrupted chunk will have to be downloaded again, but a method to identify corrupted chunks is needed. This is done by using MD4 hashes.
A MD4 hash is a unique value each chunk is given and is the result of a mathematical operation on every single bit on the chunk. This means that modifying a single bit in a chunk would result in a completely different hash. That means that the client needs to verify the integrity of each part of a file as it is downloaded.
Not only are the chunks hashed but also, in order to get a file-hash, all chunks's hashes are concatenated one after the other in their file order (that is: chunk1's_hash+chunk2's_hash+chunk3's_hash+...) and the resulting string is hashed. This way, each file on the ED2K network has a unique identifier. The file hash isn't taken from hashing the whole file, but from hashing the value of the chunk's hashes.
In reality, you need both the hash of a file and its size. These pieces of information is embedded in the ED2k URLs found in many places.
Take this for example:
ed2k://|file|eMule0.42f-Sources.zip|2407949|CC8C3B104AD58678F69858F1F9B736E9|/
The interesting parts are the fifth part, "2407949", which is the size of the file in bytes and the last part, "CC8C3B104AD58678F69858F1F9B736E9", which is the hash itself, stored as hex-decimals, 32 letters long.
The filename itself is irrelevant in the process of identifying the file.
Why after searching, some files which are the same appear as a different file in the results, although they even have the same name?
If you understood "What is a hash" you will understand this quickly. When a search is started, the server tells the ED2K client the filename of the found file and the hash of the complete file for each file which matches the search. If two files, although being the same, have some difference in their content, no matter if it's big or small, the hash is different, so they are considered as a different file. That's also the reason why two file with different file name appear as the same file: on the ED2K network, the filename isn't important, the hash is.
What is LowID and HighID?
Each client is assigned an ID (Identification) number which is unique and will distinguish the client from all other clients on the server. If this ID is below 16777216 (16 million) then you have a LowID and anything over is a HighID. Whether your client receives a high or low ID will depend on your client and whether or not the Client TCP port is open. Client TCP Port is an customisable option located in Preferences -> Connection. The default port is 4662 which is fine. If you understand what ED2K is, odds are you'll understand the chances that clients with LowIDs may be unable to connect to other clients with LowIDs which reduces transfer rates significantly. This is the reason why having port 4662 TCP (or the one set in Preferences) is so important. Some of the larger servers refuse clients with LowIDs and disallow connections since LowID clients have data transfered through the server, rather than directly from the other client, which in turn adds more overhead for the server.
For clients with a HighID, their ID is the result of a mathematical operation with their IP which corresponds to A + 256*B + 256*256*C + 256*256*256*D, where the IP is A.B.C.D. Keep in mind that this ID has identification purposes as well. Apart from having an ID over or under 16777216, it does not matter if the ID is bigger or smaller. This means a client with an ID of 50000000 isn't any better than a client with an ID of 49999999. The only exception is at times servers are either incorrectly configured or are very busy and issue LowIDs to clients even though port 4662 TCP is open. These are rare exceptions but it can happen sometimes.
If you're unsure about having proper port settings, you can test your ports here.
Which ports do I have to configure in a firewall or router to run aMule?
One has to distinguish between incoming and outgoing connections. Normally, all ports of a router are open for sending data (outgoing connection).
So, in this normal case, You only have to configure the ports for incoming connections:
aMule works even with no specific ports opened, but you won't get a HighID in this case. As mentioned above, to be given a HighID, port 4662 TCP (or the one set in the Preferences) must be listening (i.e. opened in your firewall and forwarded in your router).
Apart from that port, to have an optimal ED2K experience, two more port should be enabled for listening as well: UDP ports 4672 and 4665 (that is, TCP_PORT+3) (both can be changed to any other number in the Preferences).
What does each port do?
Well, since most ports can be configured to be set to any other number, the defaults will be listed. The traffic direction is from client perspective (you):
- 4661 TCP (outgoing): Port, on which a server listens for connection (defined by server).
- 4662 TCP (outgoing and incoming): Client to client transfers.
- 4665 UDP (outgoing): Used for global server searches and global source queries. This is always Client TCP port + 3
- 4672 UDP (outgoing and incoming): Extended eMule protocol, Queue Rating, File Reask Ping
- 4711 TCP: WebServer listening port.
- 4712 TCP: External Connection port. Used to communicate aMule with other applications such as aMule WebServer or aMuleCMD.
Are there any limitations on the ED2K network?
Not much, but yes, there are: two natural limits and a "forced" limitation. The two natural limits have already been mentioned before. First, the issues on LowID users (their transfers involve data through the server and two LowID clients can't share between them). The second, although ED2K is a p2p protocol, it needs servers to establish the p2p connection. This latter one is solved in the Kademlia protocol.
About the "forced" limitation, it's only a limit to make sure that clients share so that the ED2K network will not disappear: clients which have an upload limit of X KBps, where X is between 0 and 3.99 (both included) can download at a maximum of X*3 KBps. Clients which have an upload limit of Y KBps, where Y is Between 4 and 9.99 (both included) can download at a maximum of Y*4 KBps. Clients with an upload limit of 10KBps or more have no downloading limitations. This restriction is set in the client application so it could be by-passed by hacking the code, but that would probably result in being banned from the servers you connect to.
Also, any client is forced to allow at least three upload slots, so it's not possible to allow more than upload_limit/3 KBps per slot.
There is one last limit: Network file limit is slightly under 4GB aproximately (exactly 4294967295 bytes, allthough aMule will only support files up to 4290048000 bytes).
Additionally, this is not an eD2k limitation but a server limitation, servers will only send 300 results for your searches, so don't expect any more results.
And on the clients side, filenames are usually limited to 161 characters.
Are there any limitations on the Kademlia network?
- As it is a network derived from the ed2k network and, therefore, has to maintain compatibility when it comes to identify files uniquely, the 4GB maximum file size limit exisits in the Kademlia network too.
- Same happens with the 161 characters limit.
In search window, what filter stands for which filetype?
Keep in mind that the filters in the search window don't depend on the file type, but on the extensions of the filenames, in the following way:
- Archive: .ace .arj .rar .tar.bz2 .tar.gz .zip .Z
- Audio: .aac .ape .au .mp2 .mp3 .mp4 .mpc .ogg .wav .wma
- CDImage: .bin .ccd .cue .img .iso .nrg .sub
- Picture: .bmp .gif .jpeg .jpg .png .tif
- Program: .com .exe
- Video: .avi .divx .mov .mpeg .mpg .ogg .ram .rm .vivo .vob
So, a movie file that has the name "Birthday.zip" will appear in the Archive filter, but not in the Video filter.
What is a source?
A source is a client which is sharing some chunk in some file you have in your downloading queue which you still have not completed. Obviously, the more sources you can get for a given file, the more possibilities you have to download the file and the quicker you'll download it. Keep in mind that there's a difference between "sources" and "available sources" if you're on LowID, since "sources"s stands for clients sharing a chunk or file you still haven't completed, while "available sources" stands for clients sharing a chunk or file you still haven't completed and from who you can download (that is, a source who is on HighID).
What is all this talk about credits, ratings and scoring about?
All three concepts have to do with the way in which the ED2K network establishes the uploading queues preferences.
The score is the most important value: the client with the higher score will be the next client which you'll provide a slot to. The way in the score value is set is this: score = rate x time_waiting_in_seconds / 100 So, to understand this, we must known what rate is.
Rate is can be understood as an objective preference. This is, the preference which a client is given without caring how much time it's been waiting. When a client is added to the uploading queue, it gets a rate of 100. This value is modified following according to this:
According to the amount of credits, the rate will be multiplied by 1x to 10x.
Depending on the file priority, it will be multiplied by 0.2x to 1.8x (Release 1.8x, High 0.9x, Normal 0.7x, Low 0.6x, Very Low: 0.2x).
Users on specific old clients which load too much the network traffic will get penalized by multiplying their rate by 0.5x.
Banned clients will instantly get no rate (that is, their rate will by multiplied by 0).
This multiplying values are known as "modifiers". Clients with a modifier value strictly bigger than 1 will be marked as yellow in the icon.
So we only have credits left to known. Credits are a prize you get for uploading files to a specific user. Credits are exchanged between two specific clients, they are not global, so your own credits can't be viewed, although you can know the credits any other user has on you (that is, the credits you owe that client). Since credits are managed by the uploading client, you might be uploading to some client with no credits support, so you will gain no credits on him, although that client will actually get credits on you if it uploads to you, since you do have credits support. This credits are stored in clients.met file.
The credits modifier used by rate is the lower between these two: 
(upload_total x 2)/download_total or sqrt(upload_total+2) where both upload_total and download_total are measured in MBs.
If the result is lower than 1, then it is set to 1 and if it is bigger than 10, it is set to 10. In addition, if the uploaded total is less than 1MB, the modifier is set to 1 and if the downloaded total is equal to 0, then the modifier is set to 10.
What is a slot?
When uploading files, your upload bandwidth (which may vary depending on the upload limit or the natural connection-type upload limit) will be divided into slots. So, each slot is an amount of KBps which will be assigned to each client who tries to download from you.
