Thread: [Dev]XTool
View Single Post
  #152  
Old 28-05-2020, 12:48
panker1992's Avatar
panker1992 panker1992 is offline
Registered User
 
Join Date: Oct 2015
Location: Always Somewhere
Posts: 515
Thanks: 112
Thanked 739 Times in 289 Posts
panker1992 is on a distinguished road
Dedup

Quote:
Originally Posted by FitGirl View Post
Thanks for returning to the project, deduplication is a very useful feature.
I have an idea which will reduce the required RAM for dedup. You may store some rare/large duplicated streams in a temp file, while storing small/frequent dupes in RAM - this way the excessive HDD load won't happen, cause reads will be rare and the RAM won't be used that much. 1-2 GB is a pretty big amount even for machines with 8 GB. And for users with 4 GB installation will be almost impossible, considering srep and lolz/lzma. Even with page file. So reduction/control over used RAM is a must, I think.

I'd recommend you Halo Reach for testing dedup, it has tons of duplicate streams of a different size.
there is also a sorting match feature that can reduce ram needed and that is as follows.

srep does a very good job finding matches that are located far away!
that in order to happen makes a dictionary!
IF you sort the files you feed srep you can actually reduce ram needed and its speed

Sorting preprocession can speedup the process and cost less ram !! and remove IO overhead because NO temps
__________________
My projects : Masked Compression, lzma2(xz) on Freearc, Zstd compressor for windows
My optimizations : packjpg.exe, zstd, lzham, precomp-dev-0.45.
Reply With Quote