Difference between revisions of "FAQ eD2k-Kademlia"
| m (in vision of porting wiki as aMule docs with tarball, fixing issues) |  (this was wrong and has been added. it's not even in eMule's docs! hehe) | ||
| Line 39: | Line 39: | ||
| A [[MD4 hash]] is a unique value each chunk is given and is the result of a mathematical operation on every single bit on the chunk. This means that modifying a single bit in a chunk would result in a completely different hash. That means that it is novel to verify the integrity of each part of a file as it is downloaded.   | A [[MD4 hash]] is a unique value each chunk is given and is the result of a mathematical operation on every single bit on the chunk. This means that modifying a single bit in a chunk would result in a completely different hash. That means that it is novel to verify the integrity of each part of a file as it is downloaded.   | ||
| − | Not only are the chunks hashed but also  | + | Not only are the chunks hashed but also, in order to get a file-hash, all chunks's hashes are concatenate one after the other in their file order (that is: chunk1's_hash+chunk2's_hash+chunk3's_hash+...) and the resulting string is hashed. This way, each file on the ED2K network has a unique identifier. The file hash isn't taken from hashing the whole file, but from hashing the value of the the chunk's hashes. | 
| In reality, you need both the hash of a file and its size. These pieces of information is embedded in the ED2k URLs found in many places. | In reality, you need both the hash of a file and its size. These pieces of information is embedded in the ED2k URLs found in many places. | ||
Revision as of 11:50, 15 October 2004
Contents
- 1 What is ED2K?
- 2 What is Kademlia?
- 3 Is Kademlia the same as Overnet?
- 4 What is a chunk?
- 5 What is a hash?
- 6 Why after searching, some files which are the same appear as a different file in the results, although they even have the same name?
- 7 What is LowID and HighID?
- 8 Which ports do I have to configure in a firewall or router to run aMule?
- 9 What does each port do?
- 10 Are there any limitations on the ED2K network?
- 11 In search window, what filter stands for which filetype?
- 12 What is a source?
- 13 What is all that credits, rate and score stuff about?
- 14 What is a slot?
What is ED2K?
ED2K is a protocol originally used by the P2P (Peer-to-Peer) client eDonkey2000, which is where the name comes from. It is a server-client based protocol, with the ability to exchange sources between clients.
The ED2K network is server based unlike P2P networks such as Kazaa, which means that the first thing you do when you run aMule is to connect to a server (either manually or automatically).
Once successfully connected to a server, the client can search, either locally (the connected server) or globally (all servers), for any file and the servers asked will provide the client with a list of all the files which match search parameters.
If the user starts a download, the client will then ask the server for sources, which the server will return in the form of IP addresses for the clients that have told the server that they have the specific file.
Then the remote client will begin to upload a whole chunk to your client as soon as you are the first in the queue, read What_is_all_that_credits,_rate_and_score_stuff_about? and, when the chunk has been completly sent, you will be taken back to it's upload queue. This way different chunks get spread around the ED2K network, so that, although no-one may have at a same given moment the complete file, it may be completed by downloading the different chunks from different people (it is well known that users tend to stop sharing a file once it has been completed).
Note that clients upload only one chunk at a time to another client. Even if a client is in the upload queue of two different files of a same user and gets to the top of both, that user will only upload one of the files to that client (the other upload, depending on the ED2K application the client uses, will probably remain as a maximum priority upload, but will not begin until the other chunk has been successfully uploaded).
If both users have HighID (see What is LowID and HighID?) the transfer will be done directly from client to client (Peer-to-Peer), but if one of the clients have LowID, the connection will be established through the server, since LowID cannot accept incoming connections. As a result, two LowID clients cannot connect to each other.
What is Kademlia?
Kademlia is a natural evolution of the ED2K network. Kademlia is the future. See Are there any limitations on the ED2K network? for more information on why Kademlia is necessary.
Since Kademlia is a decentralized network, it removes the bottleneck that was previously caused by the need for servers (though Lugdunum has done great work in reducing this bottleneck). Now, instead of connecting to a server, you just connect to a client (with a known IP-address and port), which supports the network Kademlia. This is called the Boot Strapping.
Once connected, depending on your ability to accept incoming connections, you are given either "open" or "firewalled" status, which is similar to the HighID and LowID of the ED2K network. Then you are given an ID.
At the moment, "firewalled" users aren’t supported by the Kademlia network, and therefore won’t be given an ID and will be unable to connect. Firewalled support will be added later.
When searching, each client acts as a small server and is given responsibility for certain keywords or sources. This adds to the complexity of finding sources, as you no longer have a central server to ask, but instead will have to propagate the query through the network.
Currently, Kademlia isn't supported by aMule, but it will be soon.
Is Kademlia the same as Overnet?
Short and clear: No. Overnet is the natural serverless evolution of the eDonkey software, while Kademlia is the natural serverless evolution of *Mule clients. SO, it's the same philosophy, but different rules. To learn about how Overnet works, refer to http://www.edonkey2000.com/documentation/how_on.html but, have in mind, Overnet's development is closed untill it reaches version 1.0, while Kademlia's development is completly open from the very beginning.
What is a chunk?
In the ED2K protocol, to avoid sharing corrupt files, each file is divided into various parts, which are known as chunks, and then each chunk is hashed (read below to know what a hash is). Each chunk is 9.28MB in size, so a 15MB file will be divided into two chunks (9.28MB + 5.72MB), a 315KB file will be a single chunk and a 100MB file will be divided into 11 chunks (10x9.28MB + 7.2MB).
What is a hash?
Dividing each file in chunks (see What is a chunk?) will avoid the problem of downloading a whole corrupted file since only the corrupted chunk will have to be downloaded again, but a method to identify corrupted chunks is needed. This is done by using MD4 hashes.
A MD4 hash is a unique value each chunk is given and is the result of a mathematical operation on every single bit on the chunk. This means that modifying a single bit in a chunk would result in a completely different hash. That means that it is novel to verify the integrity of each part of a file as it is downloaded.
Not only are the chunks hashed but also, in order to get a file-hash, all chunks's hashes are concatenate one after the other in their file order (that is: chunk1's_hash+chunk2's_hash+chunk3's_hash+...) and the resulting string is hashed. This way, each file on the ED2K network has a unique identifier. The file hash isn't taken from hashing the whole file, but from hashing the value of the the chunk's hashes.
In reality, you need both the hash of a file and its size. These pieces of information is embedded in the ED2k URLs found in many places.
Take this for example: 
ed2k://|file|eMule0.42f-Sources.zip|2407949|CC8C3B104AD58678F69858F1F9B736E9|/ 
The interesting parts are the fifth part, "2407949", which is the size of the file in bytes and the last part, "CC8C3B104AD58678F69858F1F9B736E9", which is the hash itself, stored as hex-decimals, 32 letters long.
The filename itself is irrelevant in the process of identifying the file.
Why after searching, some files which are the same appear as a different file in the results, although they even have the same name?
If you understood "What is a hash" you will understand this quickly. When a search is started, the server tells the ED2K client the filename of the found file and the hash of the complete file for each file which matches the search. If two files, although being the same, have some difference in their content, no matter if it's big or small, the hash is different, so they are considered as a different file. That's also the reason why two file with different file name appear as the same file: on the ED2K network, the filename isn't important, the hash is.
What is LowID and HighID?
Each client is assigned an identifying number, an ID, which will be unique and will distinguish him from any other client on the server. If this ID is below 16777216 (16 million) then you have a LowID. If it's over, then you have a HighID. To be given a high or low ID will only depend on having TCP port 4662 (or the one set up in Preferences) opened. If you understood "What is ED2K" you might understand that chances are that clients on LowID may be unable to connect to many other clients (all those on LowID) so will have a lower transfer rate. That's why having port 4662 TCP (or the one set in Preferences) is so important. Also, some bug servers refuse clients on LowID to connect to them since LowID clients have data transfered through the server and so, those big servers could be overcharged.
For HighID clients, their ID is the result of a mathematical operation with their IP which corresponds to A + 256*B + 256*256*C + 256*256*256*D, where the IP is A.B.C.D. Also have in mind that this ID has identification purposes, nothing else, so apart from having and ID over or under the 16777216 number, it does not matter if the ID is bigger or smaller. This means a client with an ID like 50000000 isn't better than a client with an ID like 49999999.
There's still an exception. Sometimes badly configured or very busy servers give LowID to some clients although the 4662 TCP port is opened. This are rare exceptions, but it might happen sometime.
Which ports do I have to configure in a firewall or router to run aMule?
No specific ports need to be opened for aMule to work, but yes to have HighID. As mentioned above, to be given a HighID, port 4662 TCP (or the one set in Preferences) must be listening.
Apart from that port, to have an optimal ED2K experience, two more ports should be enabled too. First, the UDP port 4672 (which can be configured to any other number in Preferences too) and secondly, the secondary UDP port which can't be set in Preferences. This UDP port is your TCP port + 3 (i.e.: TCP=4662 then UDP=4665).
What does each port do?
Well, since most ports can be configured to be set to any other number, the defaults will be listed:
- 4662 TCP
- Client to client transfers.
- 4672 UDP
- Extended eMule protocol, Queue Rating, File Reask Ping
- 4661 TCP
- Opened on server. Allows connection to server.
- 4665 UDP
- Opened on server. Allows asking for sources. It is always server TCP port + 3.
- 4711 TCP
- WebServer listening port.
- 4712 TCP
- External Connection port. Used to communicate aMule with other applications such as aMule WebServer or aMuleCMD.
Although officially the secondary UDP port is server TCP port + 4, some (most?) implementations use it as client TCP + 3. Any way, this port is mostly not used (aMule doesn't use it, eMule doesn't have it).
Are there any limitations on the ED2K network?
Not much, but yes, there are: two natural limits and a "forced" limitation. The two natural limits have already been mentioned before. First, the issues on LowID users (their transfers involve data through the server and two LowID clients can't share between them). The second, although ED2K is a p2p protocol, it needs servers to establish the p2p connection. This latter one is solved in the Kademlia protocol.
About the "forced" limitation, it's only a limit to make sure that clients share so that the ED2K network will not disappear: clients which have an upload limit of X KBps, where X is between 0 and 3.99 (both included) can download at a maximum of X*3 KBps. Clients which have an upload limit of Y KBps, where Y is Between 4 and 9.99 (both included) can download at a maximum of Y*4 KBps. Clients with an upload limit of 10KBps or more have no downloading limitations. This restriction is set in the client application so it could be by-passed by hacking the code, but that would probably result in being banned from the servers you connect to.
Also, any client is forced to allow at least three upload slots, so it's not possible to allow more than upload_limit/3 KBps per slot.
There is one last limit: Network file limit is 4Gb.
In search window, what filter stands for which filetype?
Have in mind that the filters in the search window don't depend on the file type, but on the extensions of the filenames, in the following way:
Archive: .ace .arj .rar .tar.bz2 .tar.gz .zip .Z
Audio: .aac .ape .au .mp2 .mp3 .mp4 .mpc .ogg .wav .wma
CDImage: .bin .ccd .cue .img .iso .nrg .sub
Picture: .bmp .gif .jpeg .jpg .png .tif
Program: .com .exe
Video: .avi .divx .mov .mpeg .mpg .ogg .ram .rm .vivo .vob
So, a movie which's filename is "Birthday.zip" will appear in the Archive filter, but not in the Video filter.
What is a source?
A source is a client which is sharing some chunk in some file you have in your downloading queue which you still have not completed. Obviously, the more sources you can get for a given file, the more possibilities you have to download the file and the quicker you'll download it. Have in mind that there's a difference between "sources" and "available sources" if you're on LowID, since "sources"s stands for clients sharing a chunk or file you still haven't completed, while "available sources" stands for clients sharing a chunk or file you still haven't completed and from who you can download (that is, a sources who is on LowID).
What is all that credits, rate and score stuff about?
All three concepts have to do with the way in which the ED2K network establishes the uploading queues preferences.
The score is the most important value: the client with the higher score will be the next client which you'll provide a slot to. The way in the score value is set is this: score = rate x time_waiting_in_seconds / 100
So, to understand this, we must known what rate is.
Rate is can be understood as an objective preference. This is, the preference which a client is given without caring how much time it's been waiting. When a client is added to the uploading queue, it gets a rate of 100. This value is modified following according to this:
According to the amount of credits, the rate will be multiplied by 1x to 10x.
Depending on the file priority, it will be multiplied by 0.2x to 1.8x (Release 1.8x, High 0.9x, Normal 0.7x, Low 0.6x, Very Low: 0.2x).
Users on specific old clients which load too much the network traffic will get penalized by multiplying their rate by 0.5x.
Banned clients will instantly get no rate (that is, their rate will by multiplied by 0).
This multiplying values are known as "modifiers". Clients with a modifier value strictly bigger than 1 will be marked as yellow in the icon.
So we only have credits left to known. Credits are a prize you get for uploading files to a specific user. Credits are exchanged between two specific clients, they are not global, so your own credits can't be viewed, although you can know the credits any other user has on you (that is, the credits you owe that client). Since credits are managed by the uploading client, you might be uploading to some client with no credits support, so you will gain no credits on him, although that client will actually get credits on you if it uploads to you, since you do have credits support. This credits are stored in clients.met file.
The credits modifier used by rate is the lower between this to: 
(upload_total x 2)/download_total or sqrt(upload_total+2) where both upload_total and download_total are measured in MBs.
If the result is lower than 1, then it is set to 1 and if it is bigger than 10, it is set to 10. In addition, if the uploaded total is less than 1MB, the modifier is set to 1 and if the downloaded total is equal to 0, then the modifier is set to 10.
What is a slot?
When uploading files, your upload bandwidth (which may vary depending on the upload limit or the natural connection-type upload limit) will be divided into slots. So, each slot is an amount of KBps which will be assigned to each client who tries to download from you.
