AMD card for Neurodebian/Win10

Hey all,

 

Is the current accepted wisdom AMD for Neurodebian (and sigh, Win10).

http://psychtoolbox.org/requirements.html#graphics-hardware-requirements suggests the open source drivers are best.

 

If so, which card is the current winner for 120Hz, outputting to 2 mirrored screens.

The cook book here (https://github.com/Psychtoolbox-3/Psychtoolbox-3/wiki/Cookbook:-Setting-up-Ubuntu-with-Modern-AMD-Cards

) suggests the Radeon™ Pro WX 4100 Graphics (<£300).

 

Is this suitable still, or is there any reason to consider the AMD RADEON PRO WX 9100 16GB (>£1400)

 

Thanks all,

 

Dagmar Fraser

BUIC Research Technician

 

 

 

 

Hi, I'm using a single Display++ at 120Hz with a WX 4100 (MATLAB 2018a + Ubuntu 18.04.1), and I am currently having problems with dropping frames. IFI at 120Hz is 8ms, and I'm using an eyelink and sending strobes via the Display++ etc, so my overall loop overhead has me right up against the wall. We also bought a WX 5100 which seems slightly better, but certainly doesn't solve our problems. One problem is that using the Bits++ modes drops max FPS by up to 100fps, which added to strobe management and eye position monitoring with complex stimuli appears significant (things are better if I use FloatingPoint32Bit). So one question is whether your experimental design requires complex graphics and/or other overhead?

Now I don't have experience of the WX 7100 or WX 9100, but I would imagine the greater memory bandwidth and size if nothing else should help? Some other users here are using the WX 7100 (same 8GB as the WX5100 but more bandwidth) so perhaps they can comment on their experiences. WX9100 has 16GB and more bandwidth still, but the price is pretty eye-watering! 

Mario maybe has some better idea about what stereo @ 120Hz may require GPU wise...

Ian
 
----

I'm also using the low-latency kernel and have tested with/without the latest mesa drivers, but nothing really helps much (or with the latest mesa drivers, MATLAB becomes unstable). Until MATLAB supports lightweight threads (parallel MATLAB is not an answer, I've tried it), I think my loop overhead at least will not be helped much by changing GPU.

iandol@... [PSYCHTOOLBOX] <PSYCHTOOLBOX@yahoogroups.com> wrote:
>
> Hi, I'm using a single Display++ at 120Hz with a WX 4100 (MATLAB 2018a + Ubuntu 18.04.1), and I am currently having problems with dropping frames. IFI at 120Hz is 8ms, and I'm using an eyelink and sending strobes via the Display++ etc, so my overall loop overhead has me right up against the wall. We also bought a WX 5100 which seems slightly better, but certainly doesn't solve our problems. One problem is that using the Bits++ modes drops max FPS by up to 100fps, which added to strobe management and eye position monitoring with complex stimuli appears significant (things are better if I use FloatingPoint32Bit). So one question is whether your experimental design requires complex graphics and/or other overhead?
>

What do you mean with "better" if you use FloatingPoint32Bit?
I assume you follow optimizations like ordering your loop as

1. Drawing commands.
2. Screen('DrawingFinished',...)
3. Other logic if possible.
4. Flip.

Priority() scheduling etc.

Obviously, if you don't need for Flip's to complete, because you use
BitsPlusPlus() functions ability to send T-Lock driven strobes via the
Display++, you could use non-blocking Flip via 'vblsynclevel' 1
instead of the default 0 and maybe squeeze out a bit of extra time for
your loop. On Linux with the open-source drivers one can use
Screen('GetFlipInfo') functionality (cfe.
PerceptualVBLSyncTestFlipInfo...) to log and get Flip timestamps after
the fact, e.g., without waiting for a flip to complete, or collecting
them at the end of a trial. Iow. squeeze out more parallelism between
gpu and cpu processing.

If the loop overhead is mostly caused by things like Eyelink then the
gpu would not help much of course. Multithreading wouldn't help either
if your design would be something like a gaze-contingent display i
guess. Have you profiled where time is spent?

>
> Now I don't have experience of the WX 7100 or WX 9100, but I would imagine the greater memory bandwidth and size if nothing else should help? Some other users here are using the WX 7100 (same 8GB as the WX5100 but more bandwidth) so perhaps they can comment on their experiences. WX9100 has 16GB and more bandwidth still, but the price is pretty eye-watering!
>
> Mario maybe has some better idea about what stereo @ 120Hz may require GPU wise...
>
> Ian
>
> ----
>
> I'm also using the low-latency kernel and have tested with/without the latest mesa drivers, but nothing really helps much (or with the latest mesa drivers, MATLAB becomes unstable). Until MATLAB supports lightweight threads (parallel MATLAB is not an answer, I've tried it), I think my loop overhead at least will not be helped much by changing GPU.
>

How does Matlab become unstable with the latest Mesa drivers?
-mario

Many thanks for the informative answer.

 

Unfortunately I will be stuck with mirroring between a projector (into the fMRI chamber) and an LCD monitor in the console room.

I will try a hardware Gefen DVI-Splitter (http://www.gefen.com/product/12-dual-link-dvi-distribution-amplifier-EXT-DVI-142DLN) rather than software mirroring, given your dire warnings.

 

I’ll go with the AMD RADEON PRO WX 5100 8GB GDDR5 (Polaris 10, with 8GB and double the bandwidth of the Polaris 11 WX 4100) and only a £100 price delta.  Mario, can you email me privately on d.s.fraser@... and I’ll attempt to get you a WX 4100 to you for permanent use.

 

Thanks all,

 

d

 

 

From: PSYCHTOOLBOX@yahoogroups.com [mailto:PSYCHTOOLBOX@yahoogroups.com]
Sent: 18 September 2018 17:09
To: Psychtoolbox psychtoolbox
Subject: Re: [PSYCHTOOLBOX] AMD card for Neurodebian/Win10

 

 

Hi Dagmar,

d.s.fraser@... [PSYCHTOOLBOX] <PSYCHTOOLBOX@yahoogroups.com> wrote:

> Hey all,
>
> Is the current accepted wisdom AMD for Neurodebian (and sigh, Win10).
>
> http://psychtoolbox.org/requirements.html#graphics-hardware-requirements suggests the open source drivers are best.
>

Yes, AMD graphics cards with the open-source drivers which are already
installed by default, iow. if you do nothing, you do it right. Ubuntu
18.04.1 LTS would be the current proper distro to use for
"future-proof-ness", as this is the main development/test platform for
upcoming Psychtoolbox 3.0.15, although Ubuntu 16.04.5-LTS would be
equivalent atm., but i will change our documentation soon to assume
18.04 LTS.

> If so, which card is the current winner for 120Hz, outputting to 2 mirrored screens.

Outputting to mirrored screens is usually not a good idea, unless the
screens are monitors of 100% identical model and vendor, set to
exactly the same resolution and refresh rate - and possibly using
PTB's display-sync command. It is probably even a worse idea on
Windows 10, given all the faultiness of Windows with multi-display
setups. Presentation timing and reliability will usually suffer
greatly unless you get everything just right.

I hoped to add some hacks to the open-source drivers to allow at least
stimulation display + control monitor mirroring, e.g., for typical
fMRI scanner setups, but was way too overwhelmed with other work
during the last months, so that hasn't happened yet.

>
> The cook book here (https://github.com/Psychtoolbox-3/Psychtoolbox-3/wiki/Cookbook:-Setting-up-Ubuntu-with-Modern-AMD-Cards
>
> ) suggests the Radeon™ Pro WX 4100 Graphics (<£300).

It was one tested card by Ian. In principle any modern AMD card should
do, depending on your performance needs. E.g., i had access to a
Radeon R9 380 Tonga Pro which could drive a VPixx 120 Hz 1920x1080
display without any performance problems easily, including all bells
and whistles like high bit depths.

That said, at the moment i do not have any access to any modern AMD
card, because the lab owning the R9 380 moved away and the pathetic
financial situation of PTB made it impossible to buy any modern card.
Therefore i couldn't test any modern AMD card in the last six months
and will just assume the drivers continue(d) to work as well as they
did in the past, hearing nothing to the contrary. The last lightly
tested card is some 5 year old HD-8000 series card in the laptop of my
flatmate. The only regularly tested one atm. is a 9 year old HD-5770.

Polaris class gpu's like the WX4100 are probably the sweet spot atm.
in terms of reliability. They are from the previous generation, and
the last gpu's that are supported by both the old kms display control
code in which develoment i was heavily involved in the past and the
new "Display Core" (DC) code, so provide the best redundancy in case
bugs would interfere with some functionality.

The latest generation of Vega class gpu's only works with the new DC
code, to which i so far only contributed some bug-fixes. They probably
work just fine and have some extra features, but are so far entirely
untested by myself for compatibility with Psychtoolbox.

> Is this suitable still, or is there any reason to consider the AMD RADEON PRO WX 9100 16GB (>£1400)

Unless you need very high performance, or to waste lots of money,
probably no. Btw. very high-end cards like those are unlikely to be
better supported/tested by myself, because i certainly couldn't ever
afford spending so much money on a graphics card, so anything i'd ever
buy for development and testing would be modern (Polaris and Vega at
this time) but rather middle-class and low price for testing.

-mario

> Thanks all,
>
> Dagmar Fraser
>
> BUIC Research Technician

iandol@...> wrote :

> What do you mean with "better" if you use FloatingPoint32Bit?

I have a benchmarking mode (dontsync==2, flip as fast as possible), and with `FloatingPoint32Bit` depending on the stimuli I get about 100fps more than when using the EnableBits++ modes. Bits++ is also slower than Mono++ mode FWIW.


-> Ok, that's expected. Mono++ needs a custom shader to execute vs. only FloatingPoint32Bit (simple fixed-function pipeline emulation shader). Color++ mode (i assume you meant that instead of Bits++) requires a shader that has to do branching (if-else statement), executing different code paths depending if it is writing color info to an even or odd column of the framebuffer. gpu's don't like that performance-wise, because computations are usually executed with the same instructions in parallel on a block of adjacent pixels in the framebuffer, e.g., 2x2 or 4x4, 8x8 or such (SIMD paradigm with masking). If all target pixels in such a block take the same branch of an if-else statement or for/while loop, or switch-case statement, then the hw has to only compute that branch. If different pixels in the block evaluate their branch-condition to different truth values, e.g., half of them want to take the if-then branch, the other ones the else branch, then the hw has to execute both branches for each pixel, then throw away the results for the "branch not (supposed to be) taken" - iow. you pay with performance loss if decisions in the shader are not taken the same way for a spatially coherent group. Or simply said, shaders don't like complex if-else statements etc., you pay for them, depending on the task at hand. Specific details vary by gpu model and vendor - and this explanation is quite a bit of an oversimplification, just to get you the idea why this happens.

As you are one of the few people contributing shaders to Psychtoolbox:

That's why sometimes it makes sense to write two different versions of a shader for different modes of operation, instead of one shader with an if-then-else branch. Or split up computation in multiple simple passes with less branching. In the end it is a tradeoff between squeezing out more performance vs. maintenance overhead for us having more shader variants and more need for testing different cases.

If the condition for if-then-else/for-loops/while-loops etc. doesn't depend on "pixel local" or "texel local" variables (e.g., varyings or texture input in GLSL), but just on the value of uniforms, that's less of a problem, as then certain variables can be treated as constants for a large part of a render-pass -- in fact, the shader compiler may even turn such uniforms into constants and then build an instance of the shader specific to that setting, applying more optimizations. Also slowly varying input often allows larger groups of pixels to only take one common branch.
> I assume you follow optimizations like ordering your loop as

In general I try to always follow that order yes, although for example I do need to get the eyelink sample before I draw the stimuli depending on the logic needed. If I check after drawing then there is a 1 flip delay if fixation was broken etc. I've profiled the simple eyelink getsample and it occupies a fraction of a millisecond. I do use an object oriented (OO) finite state machine to drive the logic of experimental tasks, and was worried that OO overhead may contribute[1]. I've profiled using No-op methods (i.e. all the OO call overhead is there but no other processing), and the state machine overhead is negligible. 

> Priority() scheduling etc.

I use the recommended max priority == 1 for Linux, I've tried higher values just in case without much effect. I do notice priority doesn't affect the `nice` value, would changing that have any effect?

-> No. The 'nice' value is only used for non-realtime scheduling, to distribute cpu time fairly (on average) across non-realtime processes. RT scheduling plays by different rules, and higher Priority() value means higher priority. The current setup file in /etc/security/limits.d/psychtoo... defines values between 0 and 50 as allowed for users in the psychtoolbox unix group. The range goes up to 99 iirc.
> Obviously, if you don't need for Flip's to complete, because you use BitsPlusPlus() functions ability to send T-Lock driven strobes via the Display++, you could use non-blocking Flip via 'vblsynclevel' 1

I haven't tried this yet, something to look into thanks. I am also trying to use the undocumented T-Lock2 system (the tlock can trigger I/O on the same flip rather than subsequent one), but haven't yet got it set up.

>Multithreading wouldn't help either if your design would be something like a gaze-contingent display i guess. Have you profiled where time is spent?

Yes, the majority of the time is spent in Screen drawing subcommands. I don't need gaze contingent drawing at the moment, but do need accurate fixation testing and communication with the eyelink and other equipment via strobed words. 

-> Btw. for some simple assessment of where graphics time is spent, there's the https://www.mesa3d.org/envvars.html GALLIUM_HUD environment variable (e.g., launch with export GALLIUM_HUD=blah octave). GALLIUM_HUD=help glxinfo is an easy way to list all supported options. This will draw some overlay on top of your rendering, with graphs that give an idea where processing time is spent, if the gpu is running at maximum clock etc.

There are more sophisticated tools like FrameRetrace to get to the bottom of where time is spent: https://www.youtube.com/watch?v=q-5YkK3dGtI

> How does Matlab become unstable with the latest Mesa drivers?

This was with the padoka stable PPA a few weeks back (which was 18.1.something IIRC), I found that I would get non-reproducible MATLAB Java errors (i.e. internal errors), often occuring after a PTB run when using a Guide GUI. Unreproducibly, MATLAB would freeze and require a killall. If it had been a reproducible error I would have reported it. Perhaps things are now better with mesa 18.2?

-> Ok.

-mario


Thanks as always for your knowledge and suggestions, Ian


----
[1] MATLABs original OO implementation was significantly slower than functional programming, but this has changed drastically over the last few years...