Weird Sync Problem on Linux (due to not recommended NVidia graphics)

Hi,
We (Andreas Tolias Lab) have been using psychtoolbox for over 10 years now but this problem has shown up recently after we upgraded everything, computer, video card (GTX 1060) and software to Ubuntu 18.04 in combination with Matlab 2020b. And it occurs on multiple setups that we have (similar ubuntu and matlab combinations). Previously we were running Ubuntu 16.04, Matlab 2017b and Nvidia Quadro P400.

To replicate the problem: If we run DriftDemo first time after a fresh launch of Matlab, it runs fine and the Sync tests pass fine, the standard deviation of VBL intervals is about 0.03 msec. The subsequent runs all give Sync errors, and the standard deviation of VBL intervals is high, as high as 0.5 msec. The only way to fix this is to quit matlab and run Matlab again.

The same problem occurs in our software, the first call to OpenWindow is fine, subsequent calls give the error. And a fresh launch of Matlab again works for the first OpenWindow call and fails for subsequent calls.

Anyone has any ideas ?

Thanks, Best, Saumil

This exact same thing (first run OK, subsequent runs fail) happens on my Nvidia RTX 2070 (and on a Nvidia T1000 laptop):

Sync timing on a RTX 2070 under Ubuntu 19.10

Not all NVidia cards are affected (a Quadro P600 machine works OK for example) but in general NVidia is not recommended for PTB (proprietary drivers makes is very difficult for Mario to dive in and fix things like this). For all my lab machines I use AMD Radeon Pro WX5100, whose drivers are built-in to Linux and whose timing and reliability with PTB is flawless.

It would be great if this could be solved, as NVidia cards are far more compatible with other tools (CUDA dominates any sort of machine learning), but I suspect that will require a lot of work from Mario’s side, and because Nvidia is not supportive of open source, it means any fixes/workarounds Mario could make may break in non-transparent way when NVidia release a new driver etc…

1 Like

Thanks Ian-Max-Andolina. Yeah, we have tried all possible tweaks, running low latency kernel specially, but there does not seem to be a consistent fix. Now what is unclear is whether this is a real problem or a problem with the test. The image display looks ok despite the failure of the test. But I guess a 0.5 msec standard deviation of timing may not be perceivable. We also have the same problem on RTX 2700. The interesting thing is that 1060 and 18.04 and the same Nvidia driver but with Matlab 2018b works just fine. So I dont think it is all due to Nvidia card but there seems to be some strange interaction with Matlab 2020b. The problem for us is that we cant go back to 2018b because its video player object has a memory leak. I am going to order the AMD card you mention and then go from there.
Thanks a lot for the reply. It helps.

Just to clarify a few things: It wouldn’t be “very difficult for me” or “require a lot of work on my side” to work around NVidia proprietary driver problem, it is essentially impossible for me without access to the driver source code. This stuff is very hard and time consuming with open-source drivers and access to cooperative open-source driver developers, but at least it is worth it. So either it works or it doesn’t, little to nothing i could do, even if i wanted and had the time. Linux + proprietary NVidia driver puts you in the same helpless situation of kicking against a black box as being on Windows or - heaven forbid - on macOS.

Wrt. CUDA/Nvidia lock-in vs. alternatives, AMD does have ways to aid/support porting CUDA to their open compute stack if one has the source code for gpu computation, and also some other frameworks, cfe. this recent FOSDEM talk from early february:

https://www.phoronix.com/scan.php?page=news_item&px=LUMI-Preparing-For-AMD-HPC

And i think some frameworks like Tensorflow do have non-CUDA ports, e.g., from AMD iirc. Myself, i don’t have any 1st hand experience with this stuff though, so i can only point out that alternatives exist or are in the making.

Another question would be why you’d need the videoplayer object in the first place, given that PTB has a builtin and higher performance GStreamer based video playback engine?

Thanks Mario, we really appreciate your work over these many many years. I have asked our lab manager to sign up for the priority support, so that will be out of the way soon.

Yeah, the machine learning work in the lab (we use PyTorch) keeps us tied to nVidia but for these setups the only reason we went with nVidia is because on our monkey setups which is running 2018b and 18.04, the same cards work. The mouse setups in our lab run a different software (and someone wrote the code using Matlab’s video player), and that forced us to move to 2020a, and with the same nvidia card that worked with monkey setups. But have ordered the AMD Radeon card now. Incidently, we started with Radeons years back when we first moved to Linux from Mac OS, then moved to Nvidia and now seems we will have to go back :-).

Thanks for the link on converting CUDA code to run on AMD GPUs. It is about time that someone does that. We too have looked at converting CUDA code to AMD cards, but it was beyond us to undertake such a task and guarantee that everything actually is bug free while fulfilling our obligations of the grant :slight_smile:

Will post soon if all is well after the Radeon card is installed.
Best, Saumil

On a slight aside, another annoyance with AMD cards is that MESA’s OpenCL support is still very poor, but if you want to use AMD’s OpenCL driver it normally requires installing the whole stack, breaking the benefits of the open source drivers for PTB. I found this script that cherry picks AMD’s OpenCL but keeps MESA working fine for PTB:

I’m using Blender for some projects and it works with OpenCL acceleration after this with PTB unaffected. Ubuntu 20.10. YMMV.

1 Like

Good! Feel free to buy multiple of those if your lab can afford it, way more uptake on those is needed, than what we’ve seen so far.

With the qualifier that i do not have any experience with PyTorch, or with AMD’s current GPU compute offerings, do let me point you to this website of AMD, which suggests there now exists some PyTorch support for AMD cards as well. Maybe worth a try, once you got your AMD card?

https://www.amd.com/en/graphics/servers-solutions-rocm-ml

Ok, but you can also have parallel installations of R2018b and R2020b, depending on use case. And there’s also excellent Octave support with PTB, so there should be ways to work around your NVidia + Matlab trouble to some degree?

[quote]

the AMD Radeon card now. Incidently, we started with Radeons years back when we first moved to Linux from Mac OS, then moved to Nvidia and now seems we will have to go back :-).
[\quote]

AMD, or Intel graphics for less demanding workloads on Linux (not on Windows, where Intel usually sucks), was already the right choice 10 years ago. It has only improved since then.

Just make sure to not repeat the mistakes some people make when setting up for AMD. A quick search on the forum or a careful read of our - recently refined - system requirements and Download instructions for Linux - will tell you. E.g., don’t install any external AMD graphics drivers, just use what is installed on Ubuntu by default, make sure to do the matlab-support package install or setup thing described, …

-mario

They are working on those problems. Cfe. this post by AMD’s John Bridgman from yesterday:

From what i’ve read - or more skimmed on the side than read, but with no first-hand experience with this at all, a less “plug and play” setup experience for AMD’s compute stack seems to be the main criticism they are facing and they are apparently working on resolutions, whereas performance etc. seems to be pretty good.

One last update: The latest PyTorch version 1.8 released 4th March now has official builtin ROCm support for AMD.

-mario

1 Like

Hi Saumil,

Did you try to use the official driver from NVidia? PTB works fine on our setups that are quite similar to the setups in your lab (ubuntu 18.04 + GTX 1060) except that we used the official driver from NVidia.

For Ubuntu 18.94 we installed the official driver via the following method:

  1. sudo add-apt-repository -y ppa:graphics-drivers/ppa
    sudo apt-get -y update

Then (!!!very important!! Do not use the command to install the driver, or else your will get stuck in the login screen):
Please manually update your NVidia driver via the Additional Drivers utility (under the software & updates panel) and select the latest driver for NVIDIA.

Best
-Yang

@yangzhangpsy The cause of their problems and many other NVidia related problems is use of the official driver from NVidia! That’s the whole point, that that driver has limitations and weird interactions and sometimes unexpected behaviour which causes problems. Because that driver is not open-source at all, but highly proprietary, we can’t do anything to help with problems in most cases. From their bug report you can see that the same proprietary driver works fine with one Matlab version, but not the other. So now you have unknown bad interactions between two closed down pieces of software which we can’t properly debug.

What does work with reliable timing, because it supports the same mechanisms as Intel and AMD, co-written by myself, is the open-source driver. But that driver is not an option for modern generation NVidia gpu’s if performance or use of CUDA is of any concern, as NVidia effectively blocks all but the lowest performance mode of the gpu for anything but their proprietary driver, by not providing the cryptographically signed needed firmware one would need to enable high performance modes. Various other important features are also not supported by Nvidia’s proprietary blob.

That’s why we recommend against NVidia, because you are basically forced to install their own proprietary and often troublesome driver if you want to get good performance.

Unless you mean a different version of the official driver from NVidia, because if you’d try enough different versions of their drivers without screwing up your system, you might get lucky and one of them might work for your configuration.

Yang,
Yes, we installed the official driver. Do you use Matlab 2020b ? The problem does not occur for Matlab 2018b in a setup with same Ubuntu and video card.
The puzzling part is psychtoolbox’s OpenWindow command works just fine each time Matlab is started, only the 2nd time the OpenWindow command gets very high variance in VBL timestamps. The beamposition time stamps are precise and same as the first run.
We are just going to try AMD card now.
Thanks for your help, Best, Saumil

Hi saumil,

The computer in my lab used MATLAB 2019a/b and the driver version is “Nvidia-driver-460-server(proprietary)”.

However, yesterday, I tested three combinations of software (octave vs. Matlab 2020b -nodesktop, Matlab 2020b) across two statuses of WIFI(on vs off) under the driver version of (Nvidia-driver-460-server(proprietary)) on one of our computer. See the attached log files for details.

In brief, 1) octave or Matlab 2020b with the nodesktop mode are fine (all passed the test).

  1. For all successful tests, the situations are better when turning off the WIFI (the SDs are smaller in the WIFI off mode).

  2. Matlab 2020b in normal mode (with desktop mode) always failed to pass the test.

So looks for MATLAB 2020b with the proprietary driver from NVidia, the better choice is to run the Matlab under the nodesktop mode.

Hope it helps a little bit.
Best
-Yang

Log info:

sudo matlab -nodesktop

                        < M A T L A B (R) >
              Copyright 1984-2020 The MathWorks, Inc.
              R2020b (9.9.0.1467703) 64-bit (glnxa64)
                          August 26, 2020

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% sudo matlab -nodesktop
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% with wifi disabled
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% OpenGL-Renderer is NVIDIA Corporation :: GeForce GTX 1060 3GB/PCIe/SSE2 :: 4.6.0 NVIDIA 460.32.03

PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.662860 ms [60.013707 Hz]. (50 valid samples taken, stddev=0.025515 ms.)

PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.663599 ms [60.011046 Hz]. (50 valid samples taken, stddev=0.071475 ms.)

PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.662164 ms [60.016215 Hz]. (50 valid samples taken, stddev=0.033461 ms.)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% with wifi enabled
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.663580 ms [60.011114 Hz]. (50 valid samples taken, stddev=0.174694 ms.)

PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.664281 ms [60.008590 Hz]. (50 valid samples taken, stddev=0.096693 ms.)

PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.664381 ms [60.008229 Hz]. (50 valid samples taken, stddev=0.036289 ms.)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% OCTAVE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sudo octave

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% With WIFI disabled
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sudo octave
GNU Octave, version 5.2.0

octave:1> PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.662273 ms [60.015823 Hz]. (121 valid samples taken, stddev=0.199727 ms.)
octave:2> PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.662917 ms [60.013501 Hz]. (50 valid samples taken, stddev=0.068506 ms.)
octave:3> PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.661220 ms [60.019616 Hz]. (50 valid samples taken, stddev=0.087501 ms.)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% With WIFI enabled
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

octave:4> PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.663084 ms [60.012900 Hz]. (50 valid samples taken, stddev=0.041307 ms.)
octave:5> PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.662378 ms [60.015442 Hz]. (50 valid samples taken, stddev=0.072841 ms.)
octave:6> PerceptualVBLSyncTest(0)
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.663179 ms [60.012557 Hz]. (50 valid samples taken, stddev=0.054387 ms.)

Thank you Mario for the detailed explanation.

It is really helpful.
Best
Yang

Caution: You should not ever need to run octave or matlab with sudo on a properly setup system (DownloadPsychtoolbox/UpdatePsychtoolbox/SetupPsychtoolbox when downloading from us, or a manual run of PsychLinuxConfiguration if installed from NeuroDebian/Debian/Ubuntu via package manager). PTB does not need sudo / admin permissions. It can’t help, but it can hurt in various ways, e.g., by now writing files with owner root instead of your normal user account. That will make it difficult or impossible to change/access those files from now on under your normal user account ie. without sudo. E.g., result files from your experiments that you can’t delete anymore if you are not sudo. Or the command history of octave/matlab which will no longer work if you run as normal user etc. This because only the root user (via sudo) will have permission to change or delete those files etc.
And of course one wrong delete command in Matlab under sudo may destroy your system – after all the sudo root user is omnipotent and most child protections are disabled.

-mario

wow… thanks Yang, that may actually be the difference in our setups where it works vs where it does not work. Will check soon. In the meantime, I have tested AMD Radeon Pro WX3200 and it has no problems with VBL test, but I miss the nVidia settings GUI :slight_smile: Thanks Best, Saumil

Actually, in setups that fail, we are running Matlab engine from python, so it is running in no desktop. But the testing that we do is in desktop environment. So far the failures correlate in the two modes but I have not tested in nodesktop mode yet without the python matlab engine.
Thanks, Best, Saumil

Thank you Mario for your kind reminder.

BTW, a little bit of updated info:

for multiple screens setup, PTB is ok when the opened screen is other than the screen presenting MATLAB desktop event under the normal MATLAB mode.

Best
-Yang

Yang,
Phew… this is the first time mail from this forum came to my mailbox, all previous mails were sent to clutter.
Thanks…We are using 2 screens and the psychtoolbox screen is the highest screen number and matlab+linux runs on the lower screen number. So there is another factor causing the problem. Havent checked nodesktop mode without python engine yet.
Thanks, Best, Saumil