Go Back   FileForums > Game Backup > PC Games > PC Games - CD/DVD Conversions > Conversion Tutorials

Reply
 
Thread Tools Display Modes
  #1  
Old 15-02-2019, 15:31
elit elit is offline
Registered User
 
Join Date: Jun 2017
Location: sun
Posts: 237
Thanks: 174
Thanked 305 Times in 106 Posts
elit is on a distinguished road
How to get most of lolz parallelism

Originally I was planing to add this as a comment to my "Cost vs return debate" thread, but I did not wanted to necro my old topic plus this is not completely relevant to that. I am not going to compare here lolz with other compressors, I am going to share my experience on how its MT works and how to get most of it.

You may have noticed that even when using -mt1, lolz will benefit from other cores by offloading certain(most certainly detection) calculations to other threads. Still, it is very common procedure of the user to select number of threads equivalent to number of CPU cores. I have made small test and compared its MT scaling on my intel 4690k CPU. I used 21a7 version but later I also tried latest 22c4b just to see ldmf in action, whether it is efficient speed wise. This test is not about final file size, it was done only on small ~600mb file and any variations were negligible regardless of any setting in all test so I will only focus on speed and time here.

Basic setting was -d128 -mc8 -tt1, and from there I started changing -mt and for last ldmf tests I changed only -ldl. Lets start with -mt1:
mt1.png
^Compression on "single" thread is done in 3:48min and average speed was ~2762k/s. I quoted "single" deliberately because CPU usage often spiked as high as 74%(!) and was at least 36-42% most of the time.

Lets try -mt2:
mt2.png
^We see almost linear scaling, speed jumped to ~4448k/s and time it took was 2:21min.

This is -mt3:
mt3.png
^Speed is even higher, at ~5100k/s and time took only 2:03min. Scaling is already worse, we gained only about ~600k/s vs previous ~1700k/s.

Finally, -mt4(and you already know what to expect do you ):
mt4.png
^Yep, speed is actually slightly slower than -mt3 as well as time by 1sec, but this is not even full picture. I made this test twice and what I am showing you is second one where I did not even touched computer during processing. In the first test, even having open web browser took the speed down to ~4400k/s which was less than -mt3! Most certainly culprit is a L2/L3 cache insufficiency/misses.

In another words, doing 2 cores on 4C CPU is the most efficient way, and even -mt1 is not that bad as I originally thought. However, since I strongly suspect high dependency on L2/L3 cache, you would be better advised to do your own tests and setup lolz's -mt param. to benefit most of YOUR OWN CPU. It wouldn't surprise me at all for example if AMD CPU's behaved differently and by same token is to consider(and test) Hyper Threading type of CPU's. Maybe on more than 4C CPU's, it will be necessary to set to -2 cores less, not just 1, especially if CPU cache is a concern. Anyway, I thought you may want to see this... oh and lets throw some -ldmf of most recent lolz version to the mix, but please consider this test may or may not be good enough for such thing as file size was too small.

But anyway, -ldmf with default -ldl8:
ldmf8.png
and -ldl5:
ldmf5.png
^These tests were done on -mt1 so to compare with previous -mt1, its a few seconds difference.

So yeah, thats it.

EDIT: on bigger files, ldmf time will extend to couple of minutes. On ~16gb it added around ~12-24min. That makes it slower than srep but, in the picture of whole compression which took more than 1h:30min(1:34min was quickess with -mt3, mt1 took 3h+) its doable. Also ldmf doesnt do full pre-pass of data like srep - which, if you compressed big things like 60gb+ on regular HDD would add another 6-12min. If added time of ldmf concerns you, you really better just use lzma and FA's 4x4 is simply unbeatable when it comes to ratio vs speed/memory(srep+xlzma:lc8 compressed those 16g in ~16min with no worse than ~8% loss == ~700% speedup vs ~8%ratio). But between external lzma's(which are slower than xlzma) and lolz, I would rather then use lolz. I mean either I want max compresson, or max efficiency.

EDIT2: unfortunately I have bad news, ldmf really doesn't work well with multiple cores, even if mtt=0. From that 16gb dataset(inflated "We Happy Few" game .pak's):
Quote:
-d128 -tt1 -mc8 -mt1 -ldmf1 = 5.6gb > 3+h
-d128 -tt1 -mc8 -mt3 -ldmf1 = 6gb > 2h
-d128 -tt1 -mc8 -mt3 -ldmf0 = 6.1gb > 1h46min
-srep:m3f+xlzma:lc8 = 6.1gb > 16min
EDIT3: finally:
Quote:
srep:m3f+lolz:d128:tt1:mc8:mt3:ldmf0 = 5.7gb > 1h17min
^from above it should be clear that:
- if you want to use more than 1T in lolz, srep is the only good option and then ldmf is pointless and would only add to compression time - make sure you disable it if you use srep with -mt > 1
- srep+lolz[MT] can still achieve very similar results to lolz[1T]+ldmf, with only slightly worse compression but significant speedup, in my case 3x
- even with srep and -mt3 though, lolz is still 5x slower than srep+xlzma, now with only about ~6.5% better cmp ratio.
- between lolz[1T]+ldmf and lolz[3T]+srep is only ~1.75% cmp ratio difference, but 300% speed difference
- of course, there will be variation with different data/game but my past experience was that 90+% of the time there was no more than ~2% variation

Then again, this is just for extra info. Main topic is about lolz and its best threading settings + ldmf and on that, conclusion is that it may be better to set MAX_CPU-1 or -2 depending on type, and ldmf only if you use -mt1 AND -mtt0, otherwise go srep(for now at least). If you want to keep best ratio with lolz[1T]+ldmf, there is one way around it: process more games separately at once, overall it will be more efficient use of CPU cores to cut time.

Last edited by elit; 18-02-2019 at 12:30.
Reply With Quote
The Following 11 Users Say Thank You to elit For This Useful Post:
78372 (17-02-2019), BLACKFIRE69 (12-09-2019), COPyCAT (20-10-2020), DedSec (16-02-2019), Gehrman (02-06-2022), Harsh ojha (01-10-2019), JRD! (22-02-2019), mubbii (16-02-2019), Pantsi (16-02-2019), Razor12911 (17-02-2019), shazzla (06-10-2019)
Sponsored Links
  #2  
Old 16-02-2019, 11:08
KaktoR's Avatar
KaktoR KaktoR is offline
Lame User
 
Join Date: Jan 2012
Location: From outer space
Posts: 3,577
Thanks: 942
Thanked 5,835 Times in 2,153 Posts
KaktoR is on a distinguished road
Had the same with mt options in my tests.

mt6 gave around 2800k/s, while mt12 gave just ~3200k/s

But I think it's dependend on the input you have.
__________________
Haters gonna hate
Reply With Quote
The Following User Says Thank You to KaktoR For This Useful Post:
elit (16-02-2019)
  #3  
Old 16-02-2019, 12:08
elit elit is offline
Registered User
 
Join Date: Jun 2017
Location: sun
Posts: 237
Thanks: 174
Thanked 305 Times in 106 Posts
elit is on a distinguished road
KAktoR do you happen to have Ryzen 2600 variant(6C12T)? If so then the fact that you can still actually see any benefits with max threads - and even more so when half is just HT, is good information to know.
Btw I dont see why input data should matter as long as its same for all tests. We should only care about different threads on same processor(measuring effectiveness), its not about my CPU vs yours just ot be clear(but yes, I too believe different input data do affect speed of lolz - from my observation).

If you have 2600 or similar those one have a huge L3 cache, although I would also try 2-3T. 2600 also have different design, it act as a NUMA but require external AMD application to switch the way cores work. You cound get a very different results with tweaking -mt and CPU mode.

Last edited by elit; 16-02-2019 at 12:11.
Reply With Quote
  #4  
Old 16-02-2019, 12:52
KaktoR's Avatar
KaktoR KaktoR is offline
Lame User
 
Join Date: Jan 2012
Location: From outer space
Posts: 3,577
Thanks: 942
Thanked 5,835 Times in 2,153 Posts
KaktoR is on a distinguished road
Yes Ryzen 5 2600 6C 12T here.

I think this because on other inputs I had nearly 4000k/s with mt6 (on my last test today I had only 2800k/s to max 3000k/s. Ok indeed the last input was about 16GB, the other one just 3GB). And of course I didn't do anything other in this time then compress with lolz.
__________________
Haters gonna hate
Reply With Quote
  #5  
Old 16-02-2019, 13:59
elit elit is offline
Registered User
 
Join Date: Jun 2017
Location: sun
Posts: 237
Thanks: 174
Thanked 305 Times in 106 Posts
elit is on a distinguished road
You need to chose one data and do all tests on it with different -mt[N] on that same data and same other settings of course. For your CPU most important would be 1-2-3-4-5-6-8-10-12 and twice with 2 different modes set through AMD app. Then you could see how your CPU's scaling goes. Idea is to see which -mt{N] gives you most effective pareto frontier. Also dictionary/block size should be small enough so that even with 12T input file can be divided to so many parts.
Reply With Quote
  #6  
Old 13-04-2019, 10:08
Simorq's Avatar
Simorq Simorq is offline
Registered User
 
Join Date: Mar 2014
Location: Iran
Posts: 642
Thanks: 3,602
Thanked 1,293 Times in 464 Posts
Simorq is on a distinguished road
RZ 1600
Code:
Creating archive: VC4.Bin.001 using rep+srep:m3f:l512+lolz:dtb1:d32:mtt1:mt10:mc1023+diskspan:4410mb:4430mb
Compressed 25 files, 64,396,399,216 => 4,624,220,180 bytes. Ratio 7.18%    
Compression time: cpu 3093.48 sec/real 17804.47 sec = 17%. Speed 3.62 mB/s
All OK

Creating archive: VC4.Bin.001 using rep+srep:m3f:l512:m512+lolz:dtb1:d32:mtt1:mt6:mc1023+diskspan:4360mb:4430mb
Compressed 26 files, 64,248,717,854 => 4,571,791,380 bytes. Ratio 7.12%    
Compression time: cpu 2337.45 sec/real 17213.25 sec = 14%. Speed 3.73 mB/s
All OK
I think Lolz does not work with virtual cores.
Reply With Quote
The Following 3 Users Say Thank You to Simorq For This Useful Post:
dixen (14-04-2019), elit (14-04-2019), Gehrman (02-06-2022)
  #7  
Old 14-04-2019, 06:11
elit elit is offline
Registered User
 
Join Date: Jun 2017
Location: sun
Posts: 237
Thanks: 174
Thanked 305 Times in 106 Posts
elit is on a distinguished road
Quote:
Originally Posted by Simorq View Post
RZ 1600
Yep thats what I was talking about. Although you still was able to utilize all physical cores here. I wonder how would be on -mt5, because perhaps HT enabled CPU's don't need to be set for less than n physical cores.


BTW Simorq, you are among the most fantastic and most helpful people on this(and other) forum(s) I know. Thank you for all your dedication and personally for your last help with cls setup.
Reply With Quote
The Following User Says Thank You to elit For This Useful Post:
Simorq (14-04-2019)
  #8  
Old 30-09-2019, 16:13
ZAZA4EVER ZAZA4EVER is offline
Registered User
 
Join Date: Oct 2016
Location: egypt
Posts: 173
Thanks: 565
Thanked 192 Times in 69 Posts
ZAZA4EVER is on a distinguished road
I try some trials for lolz and get some results
I want to ask you ... When i make my trials
I use (tt12) whats your opinion in that . Specially i get good ratio with this method
lolz:dtb1:d128m:mtt1:mt4:mc1023:tt12:fba0
I wait your opinion @elit
Reply With Quote
  #9  
Old 06-10-2019, 07:45
ozolt ozolt is offline
Banned
 
Join Date: Dec 2017
Location: Random
Posts: 16
Thanks: 16
Thanked 1 Time in 1 Post
ozolt is on a distinguished road
What's the optimal and standard settings for multithreaded lolz?
Reply With Quote
  #10  
Old 15-10-2019, 17:25
elit elit is offline
Registered User
 
Join Date: Jun 2017
Location: sun
Posts: 237
Thanks: 174
Thanked 305 Times in 106 Posts
elit is on a distinguished road
Quote:
Originally Posted by ZAZA4EVER View Post
I use (tt12) whats your opinion in that.
Back then when I have tested tt1 vs tt4, difference was within 4% or less, but speed decreased 2-3x which for already slow lolz is a kill. This setting is similar to -mc in lzma whose many here also love to abuse to unbelievable levels(1000+) and also cause significant slowdown(and have similar negligible effect on compression ration from my past tests).

Honestly, I think using such levels in both cases is retarded, I have yet to see their benefit vs cost and I dare anyone to prove me wrong with specific - well documented example. Until then, I stick with tt1 and mc32.

Quote:
Originally Posted by ozolt View Post
What's the optimal and standard settings for multithreaded lolz?
there is no standard and the most optimal one is my ;p
mtt1 mc8-32 tt1 dto0 dm00
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NEW LOLZ V22c4 doofoo24 Conversion Tutorials 51 25-05-2022 11:42
Best Compression Methods for 'Specific' Games INDEX JustFun Conversion Tutorials 35 10-05-2022 07:34
Bench Test (LOLZ vs RAZOR vs MCM vs LZMA2) felice2011 Conversion Tutorials 5 19-10-2020 07:40
LZMA vs LOLZ & Scan Compress Method yasitha Conversion Tutorials 58 11-01-2019 09:01
problem with lolz Kitsune1982 Conversion Tutorials 6 11-06-2018 13:04



All times are GMT -7. The time now is 04:10.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.
Copyright 2000-2020, FileForums @ https://fileforums.com