Redundancy search: faster p2p downloads
It looks that some people just started to discover the universe. Little elemtary article published at website called Technology Review and recommended by one of our readers talks about the new feature of all peer to peer networks. Computer scientist David Andersen, a professor of computer science at Carnegie Mellon University, worked with the Purdue group to develop a way to increase the size of the pool of uploaders called similarity-enhanced transfer (SET). The approach takes advantage of multiple variants of the same music files, video clips, and software, which are often floating around file-distribution networks. “We hope that SET gives you access to a larger pool of people to download from,” says Andersen. “And by doing so, we think you’re more likely to find one of these people who have more spare capacity.”
This idea isn’t bad. I’ve been thinking about something like that for months, and you probably know it. When some huge and very popular release appears on public, there are usually many different torrents available. However, they are sharing the same data, same movie, same RAR archives, just with different tracker or few other dissimilarities. It could only benefit all users, if someone finds a way how to connect all these torrents and allow to exchange data with all peers. Before Andersen and his colleagues conducted their study, it was not at all clear how much redundancy existed in file-sharing networks and whether it could be exploited, says Cornell University computer scientist Emin Gün Sirer, who was not involved in the study. The SET team analyzed almost two terabytes of music and video files from file-sharing networks, and it discovered that similar files typically shared anywhere between 20 and 99 percent of their content. With music files, even misspellings in user-defined header labels that identify artist and song titles are enough to throw off BitTorrent, despite the fact that 99 percent of the file is the same.
One challenge in devising a distribution system that can locate similar files is that the system must search not just for each file but also for every chunk within that file. A 700-megabyte video clip may be divided into 40,000 chunks, which means that the system must make several billion comparisons. SET is a hybrid system that first locates users with identical files before searching for requested chunks in file variants. SET’s innovation in the latter task is what the researchers call handprinting, which efficiently identifies similar files using a constant number of search queries regardless of the file size. Locating that file with just 10 percent similarity could speed up downloads by 8 percent. For music files with greater than 90 percent similarity, a five-minute download on BitTorrent would take just over two minutes with SET. For a single user, the savings could be even greater if he or she happens to be downloading an unpopular variant of a common file. The transformation process which would introduce this feature in the reality is extremely demanding though and it’s still just a sound of distant future…


Comments(19)
Could they hurry up with this…it would be a vast improvement!
why would you give this a 1??
Couldn’t this be done with intelligent SFV reading?
does this just make it easier for the likes of the m.p.a.a to log your I.P??
just imagine the possibility of mixing chunks… woa!
let suppose I get a chunk for a MP3 from another MP3 which share the same chunck, or a video that get a chunk from a rar file… I know that it too much dreaming, but if we can get a sharing system like that, there won’t be a prosecution for getting illegal content, because you got a file from differents chuncks, chuncks from different files… you got: a frankestein file!
how can anyone probe that the file you get is the original one? because you got it from parts of other files.
just food for tought..
this is like 4-5 days old news
A MUST HAVE system!
@ X so what? First ive heard of it…even if it is old news… I wish we could all be as up to date as you. Your so on the cutting edge…blah blah blah
X is right. I have seen this on several sites days ago. And no private site will ever allow this.
What a great idea…. Who up for the math?
Sounds like the idea of emule / edonkey..
That worked ! but not as fast as everyone thought !
There are tools like shareaza that combine p2p networks…
so that redundancy is already taken care of ….
lets see what future has for us
Whatever if this doesn’t turn out to work. You can also boost download speed a lot just by adding more trackers to the file yourself. I do that all the time, and it works perfect.
if u got bbc rss feed(default on firefox).. it was on like 4-5 days ago like X said..
Yeah I’ve been reading about this technology. Funny i think i kinda beat them to the punch, with Utorrent you can add multiple trackers to a torrent.
Example:
23 dvdscr appears on NTi and torrentleech and demonoid. Same release, you simply download each torrent, open with utorrent and it asks you if you wish to add the trackers. If the file is slightly different, like an added sample or nfo, you simply tell it NOT to download the different parts.
This takes it a step further as they are making it possible to use peers that have a similar, but not the same file. I’m very interested to see this in action and understand how it works, as there are various formats and settings when it comes to music and video files, so I’d have to assume the file would need to be the same format/bitrate for it to use it.
Unless it allows for on the fly over the net encoding, which I doubt!
wow boys, u just discovered Distributed hash tables (DHTs)
nice…
Some of you have the wrong idea. It is not about two trackers having the same file. It is about one tracker having two different files. Example:
file 1: The quick brown fox ran
file 2: The brown fox jumped up
Totally different files but with quite a bit of common information. In the SET idea if you were downloading file 1 you could get “The, brown, fox” from anybody with file 1 or file 2.
I agree with z_o_z_0 (nice analogy btw), what this system is aiming to do is allow us to not only benefit from loading from other tackers containing the same exact files, but to also allow us to download from those that contain only a certain percentage of the file we need, thus greatly improving the rates at which we will ultimately receive our downloads.
For example, if I were to be searching for a file containing a particular television episode, I would not only be able to download from the trackers with that exact episodes filename, but I would also be able to download from those tackers that contain the entire season from which my episode is a part of.
Why there should always be some idiot, with egocentric attitude like X or Eriol, saying “I knew” on whatever post ? Guys, go looking for some pu****s and spend your time out of the chair… leave us alone, if you don’t like something, just shut up and change site…
Oh for a dumbness filter on comments. Everybody please RTFA. Yes, you can download the _same_ file from multiple torrents, even from different networks, but if you just do that by hand you’ll end up downloading most of it twice because of different chunk sizes and overlapping chunks. This technique could do that automatically and with much less overhead. And it goes much farther than that. The “S” in SET stands for Similarity, not Sameness.