Hi again Dounia,
Congratulations, I have picked your specific sub-problem of improving parallel playback performance of multiple 4k HDR-10 movies again for “free support on resolving a non-trivial issue”, sponsored by Mathworks Neuroscience - MATLAB and Simulink Solutions - MATLAB & Simulink. You get advice and help on this sub-topic of your more general question for free that would normally have cost your lab way more than 1000 Euros. Mathworks provided us with 5 such “Jokers” for free support in the time period October 2023 to June 2024, and you get to use the second one out of five.
On to the topic:
I agree that according to NVidia specs, 56 fps would be the absolute best you could hope for playing 3 movies in parallel, although that’s a theoretical best, probably less in practice, ergo dropped frames. NVidia mentions on that page you cited that some more expensive Quadro gpu’s may have multiple NVDEC engines, so would allow for higher performance, but with all GeForce cards there is only one NVDEC per graphics card.
And those multi NVDEC ones are very expensive, starting at 1500-2000 Euros for entry level models: https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new
But luckily for your specific use case of short movie clips without sound, there is some cost-free or cheaper trickery that might work. There are some other pitfalls though, due to dual-display use and HDR, so lets see how this goes.
My first question would be how do you address your two monitors? Do you use one single PTB onscreen window that spans both HDR monitors, because the two monitors are set up as one single virtual ultra-wide HDR monitor by use of NVidia Mosaic functionality? You display_indexes() selection code suggests otherwise, but your movie playback code suggests there is only one onscreen window? This will be mostly interesting, or rather potentially problematic, if we assume that presentation timing control matters and you’d want the two monitors to display the same frames of each of the two movies, ie. play both presented movies in sync? So there might be challenges ahead.
Wrt. actual movie playback:
-
Your script doesn’t have optimal parallelism between fetching of frames from the different movies, as the GetMovieImage calls ask to block for arrival of new movie frames (1 flag). For better performance you’d want to avoid blocking and instead poll each movie for availability of new frames and only redraw and update the display for new movie frames. The approach can be seen in PlayDualMoviesDemo.m
and PlayDualMoviesTutorial.m
.
-
One cheap way to try to maybe get by with your current hardware would be to try to improve buffering of data during live playback. This will probably only help for non-looped playback though. In your code, set specialFlags1 = 2 + 256
to disable any decoding and output of sound (saving all computation time and overhead for that), and to disable deinterlacing (pointless for non-interlaced footage, but maybe gets some processing block out of the way for slightly lower overhead anyway). Then set async=4
and preloadSecs=4
or something around that. This only works meaningfully for movies without sound like yours, and will would buffer up to preloadSecs
== 4 seconds of decoded movie data internally. You’d essentially open all movies, then at the beginning of a trial, call Screen('PlayMovie', ...)
on all of them, which would start video decoding into the internal buffer queue. Then wait up to preloadSecs
seconds before actually starting your GetMovieImage
loop to fetch+draw+display frames. This way GStreamer would fill up a reservoir of preloadSecs seconds of movie, and then you would drain it in your loop at an appropriate speed. In this case it might make sense to use the wait flag 1 in GetMovieImage
as you do right now, and only draw and flip once you have new frames from each movie. Given the decoding at 56 fps is not fast enough to sustain 60 fps, it would drain the internal buffer queue, but possibly slow enough / with enough headstart to make it through the movies without dropping frames or slowing down below the target 60 fps. Note that due to the async = 4
setting, the playback should never drop any movie frames, but play out all frames in order. The whole playback would slow down to < 60 fps though if this approach can’t keep up. A preloadSecs
of at least 10 seconds would allow to buffer the whole movie this way, but I don’t think your machine has enough RAM to do this with 3 movies of 10 seconds each, as the movies alone would need ~42 GB of RAM and your machine only has 32 GB in total.
-
The slightly more expensive way, but still not very expensive, and much cheaper than adding more graphics cards or a way more powerful and expensive graphics card, would be to upgrade your computer to at least 64 GB of RAM. Looking up the specs of your processor suggests one could upgrade it to up to 128 GB. Then all content fits into the buffer queue, you can set preloadSecs
to a bit greater than 10 seconds and make it though the movie without slowdown, assuming enough headstart from ‘PlayMovie’ to start of the actual playback loop. The good thing is that movies of your kind, YUV 4:2:0 10 bit encoding, decoded with pixelFormat 11 for HDR/WCG, only take up half of the texture memory of YUV 4:4:4 or RGB 4:4:4 content due to the chroma subsampling, so you get by with about 2 Bytes per color component * 1.5 = 3 Bytes per pixel (10 bit net color packed into 16 bit color containers ie. 2 Bytes per color component), so 10 seconds * 60 fps * 3840 * 2160 * 2 * 1.5 ~ 13.9 GB per movie * 3 ~ 41.8 GB for all three 10 second movies.
-
If looped playback is needed and 2 or 3 doesn’t do the trick well enough, you could take playback matters completely into your hands, and load the whole movies into PTB textures at the beginning of a trial. You wouldn’t use active playback by GStreamer, but prefetch everything, and then your main loop would just loop through all textures and draw+flip them. This approach definitely needs a RAM upgrade to at least 64 GB, and then follow the method in LoadMovieIntoTexturesDemo.m
to prefetch everything, with a correspondingly long break between trials for loading the movies. That demo has various builtin benchmark modes to test how fast such loading can work.
If you use active playback as in item 2 and 3, then you can check for dropped frames yourself by comparing the presentation timestamps pts optionally returned by
[tex, pts] = Screen('GetMovieImage', win, movie, 1);
for expected delta of 1/fps
or at the end of playback, dropped = Screen('PlayMovie', movie, 0)
will also return a dropped
frame count, based on comparing expected vs. actual pts. This is the same number as printed by PTB into the Matlab windows if dropped frames are detected at end of playback.
I’d first try approach 2 as the cheap solution, then if that isn’t good enough upgrade RAM to at least 64 GB and try 3, or 4 for full control - especially for looped playback.
There are also hybrid approaches between 3 and 4 possible to try to shorten loading time of movies = wait time between trials, but it makes the code more involved.
So these are some things to try. However, the use of multiple displays and HDR may cause some more complications and timing issues, independent of all the above, which just relates to optimizing decoding and playback of demanding movie content, and would apply to any movie content and pixelFormat 11 playback.
Once you choose real HDR display, Psychtoolbox will switch to its Vulkan display backend for driving HDR display devices, instead of the its standard OpenGL display backend. At least on MS-Windows. This comes with additional unavoidable overhead. My recent Psychtoolbox Matlab R2023b tests showed another problem with Vulkan display on MS-Windows with NVidia graphics cards, which is that flip completion is reported one frame too early, apparently a NVidia Vulkan graphics/display driver bug on MS-Windows. This means wrong visual stimulus onset timestamps, but ironically could help slightly with performance in your case if one doesn’t care about perfect visual timing. So to separate causes, it could be useful to first test all this without the ‘EnableHDR’ PsychImaging task, ie. for display in SDR mode.
More importantly, on dual-display you can run into the problem of the video refresh cycles of both monitors not being synchronized, so if there is a difference in refresh timing or phase of the display scanout cycles, this can cause all kinds of additional timing stutter and lead to more dropped frames than what you would expect. We do have GraphicsDisplaySyncAcrossDualHeadsTest()
as a test of synchronized display scanout for MS-Windows.E.g., GraphicsDisplaySyncAcrossDualHeadsTest([2,3])
would test sync between PTB screen 2 and 3. It could be that “NVidia Mosaic” mode could help with that if it is supported by NVidia consumer gpu’s on Windows - I certainly never tested if it exists or works works with PTB, as we don’t generally recommend NVidia hardware for optimal use with PTB. If it worked, one could turn both HDR monitors into a virtual ultra-wide virtual HDR monitor, open one onscreen window on that monitor - which spans both physical monitors - and hopefully the NVidia driver would synchronize display scanout across the monitors to avoid some timing judder. But as I said, this was not ever tested with PTB on Windows, there may be problems wrt. timing, or it may not work with HDR.
Update: Dual-display mode should be workable on GeForce class:
I just tried under Windows 10 on my GeForce GTX 1650, so this is how to enable it:
- Open NVidia control panel → “Configure Surround, PhysX”
- Check the “Span displays with surround” checkbox.
- Press the “configure surround button” and set up your two HDR monitors to get unified into one virtual monitor with 60 Hz refresh rate and 7680 x 2160 pixels resolution.
- Apply settings etc.
- Start Matlab. Then run:
PerceptualVBLSyncTest([],[],[],[],300, 0, 1)
You should get a window spanning both HDR monitors, with the slightly jittering horizontal yellow line around half-way down the display, and close to that a horizontal tear-line, as we intentionally provoke tearing flips. Important is that the tear-line is at the same vertical position on both monitors → This tells you their refresh cycles are properly synchronized, which is what you want. ESCape key ends the test, or waiting for 300 seconds.
Repeat PerceptualVBLSyncTest([],[],[],[],300, 1, 1)
with vsync, to confirm homogeneous grayscale flicker without tear-line. Or PerceptualVBLSyncTest([],4,[],[],300, 1, 1)
to confirm dual-display stereo works.
You can also run PerceptualVBLSyncTest([],[],[],[],300, 0, 1, 1)
to repeat the test under Vulkan display backend.
In my test, the sync worked as desired, and also HDR mode worked across both displays, so if that reproduces on your setup, at least dual-display HDR related timing problems should not be an issue. And you could (ab)use the stereo mode 4 to draw the separate movies into the left / right display (== left/right eye “stereo setup”), as a small convenience.
End of update.
We do have some special optimizations (more of a dirty hack actually, but it seems to work) for dual-display stereo HDR-10 on Psychtoolbox for Linux with suitable AMD graphics hardware, which could help with such dual-display HDR setups. You’d need a sufficiently powerful AMD RDNA2 graphics card for that however, and it has some limitations. help PsychImaging
section UseStaticHDRHack
explains that procedure. But maybe lets first stick with your existing Windows + NVidia setup and see how far you get.
Wrt. to use of multiple graphics cards: It depends a lot on the use case and specific hardware configuration if Psychtoolbox can deal well with multiple gpu’s or not. For your specific use case of multiple HDR display, you absolutely want PTB to only use one gpu for driving the displays. Now in theory you could use multiple gpu’s for the hw accelerated video decoding, because PTB doesn’t care from which gpu the decoded video frames in RAM come from. But in practice, as far as I can see from skimming GStreamer code, the hardware decoder pipeline won’t automatically distribute movie decoding workload across different gpu’s, and the gpu drivers probably won’t do this either, so you probably would still end up with one overloaded gpu and another completely idle gpu with unused decoders. So if you wanted to throw more hardware at the problem, it would be way more advisable to buy one, more powerful (and way more expensive!), gpu with multiple hardware decoders, because NVidia docs suggest that movie decoding of multiple movies would get automatically distributed and load balanced across multiple NVDEC video decoding engines. But the approaches pointed out above are possibly good enough for your purpose and free of cost, or way cheaper. Also, memory bandwidth and overall latency could become a limiting factor with 3 simultaneous movies at these resolutions and framerates, so a gpu with more decoders may not necessarily help as much.
A few other things wrt. to your script, unrelated to the actual problem, just some redundant code:
-
If you use PsychDefaultSetup(2);
at the very beginning of the script, you can avoid the KbName('UnifyKeyNames');
and AssertOpenGL
and AssertGLSL
- the latter two would be too late in your script anyway to help - AssertOpenGL
needs to go before the first Screen command, and PsychImaging
would not work for HDR if AssertGLSL
would be able to detect errors.
-
The Screen('SetMovieTimeIndex', movie, 0);
calls are redundant, because time index 0 is where every movie starts after ‘OpenMovie’.
-
I don’t know what the global GL
and glActiveTexture(GL.TEXTURE0);
are supposed to do in your sample code? Should be pointless / redundant.
-
[win, rect] = PsychImaging('OpenWindow', screen_no, 0);
would be enough, as you leave all other parameters at their defaults anyway.
-
PsychGPUControl('SetGPUPerformance', 10)
doesn’t hurt, but doesn’t do anything on NVidia gpu’s either atm., it is only implemented for AMD gpu’s.
-
PsychImaging('AddTask', 'General', 'FloatingPoint32Bit');
doesn’t hurt, to document explicitly what precision you want to use for your study. It is implied though by ‘EnableHDR’, so not strictly technically necessary.
So far until here,
-mario