Limit for flip when timestamp on Linux with Intel GPU?

Dear Mario and list,

is there possibly an upper limit for when timestamps used for Screen('Flip', win, when) on (somewhat older) Intel GPUs? When using timestamps more than about 17 seconds in the future we consistently perceive the following error on two different machines:

PTB-ERROR: OpenML timestamping reports that flip completed before its requested target time [Target no earlier than 1531396898.530205 secs, completed at 1531396881.867865 secs]!
PTB-ERROR: Something is wrong with swap scheduling, a misconfiguration or potential graphics driver bug!Check your system setup!
PTB-ERROR: Switching to alternative fallback scheduling method.

This happens on two Linux computers (one PC with Ubuntu 16.04, Sandybridge CPU/GPU, kernel 4.13.0-26-lowlatency, Mesa 17.2.4, PTB build date Oct 3, 2017, Octave 4.0; one laptop with Ubuntu 18.04, Ivybridge CPU/GPU, kernel 4.15.0-23-lowlatency, Mesa 18.0.5, PTB build date Apr 6, 2018, Octave 4.2). It does not happen on a PC with very similar specs but with Nvidia graphics card and Nouveau driver.

Minimal code to reproduce the problem is:
winptr = Screen( 'OpenWindow', 0 )
t0 = Screen( 'Flip', winptr )
t1 = Screen( 'Flip', winptr, t0 + 16.679 )
sca

If I reduce the 16.679 s delay by one ms to 16.678 s no error is perceived (on the laptop; exact limit not tested on the other PC, thus, might be machine specific). The behavior is consistent across reboots and Octave restarts. It did not find any previous mentions of this error on the mailing list. I hope I didn't miss anything obvious.

Any idea? Thank you!
Andreas
<widmann@...> wrote :

Dear Mario and list,

-> Hi Andreas,

indeed there is a bug in the modesetting-ddx video driver that is used for Intel graphics chips on both Ubuntu 18.04 LTS and also on some recent versions of Ubuntu 16.04, maybe around 16.04.3 or 16.04.4  LTS? This bug causes any swap scheduled >= 1000 video refresh cycles into the future to execute immediately. Your 16.679 bad vs 16.678 good seconds probably just falls short of 1000 frames ie., 1000 frames vs. 999 frames, given a ~60 Hz display. The bug went unnoticed for quite a while by myself because i usually only frequently test up to 10 seconds into the future.

I found and fixed the bug early May for X-Server 1.20, but current Ubuntu still ships with server 1.19.6:


I didn't get around to submit the same fix for server 1.19 yet, but your report here answers the question if this matters in practice and is worth the effort even for a soon-to-be-retired server version.

One simple way around this is to use XOrgConfCreator + XOrgConfSelector to create a xorg.conf file that selects the intel-ddx video driver instead of the buggy modesetting-ddx video driver -- nice that we have these redundancies. I think you have to ask XOrgConfCreator to let you configure special features, and when it asks you if you want to use the new modesetting-ddx you answer (n)o instead of (y)es or (d)on't care. -> Save -> Select config -> logout -> login.

PTB will also fix the problem itself after screwing up the first such flip when this happens and throwing that error message below, as the swtich to the "alternative fallback scheduling method." - So our advanced error detection and correction on Linux works as desired. Triple redundancy is a nice thing.

Btw. on that machine with Mesa 17.2.4, could you do me a favor and run a script while at least Priority(1) is used? E.g., Priority(1); DotDemo; Priority(0)? Also on the Mesa 18.0.5 machine? And let me know how much RAM those machines have? I know of a likely Mesa bug introduced sometime around Mesa 18.0.0 with Intel graphics that causes gfx-chip lockups on some chips with some machines when Priority(1) is used to lock system memory. Symptoms are freezes of multiple seconds during visual stimulation and "gfx chip lockup" messages in "dmesg" output (drm error ... ecode soandso ...Hang in rcs detected, ......, resetting chip ...).

This is a bug with an easy and effective workaround as well, that slipped too long through my testing due to lack of time and suitable hardware, and tracking it down may be tedious, but could be somewhat less tedious if i can narrow down which Mesa versions are affected/unaffected or how much the amount of installed RAM triggers/prevents it. I know Mesa 18.0.0 has the bug, and Mesa 10.1.3 and maybe 17.0 didn't have it.

thanks,
-mario


is there possibly an upper limit for when timestamps used for Screen('Flip', win, when) on (somewhat older) Intel GPUs? When using timestamps more than about 17 seconds in the future we consistently perceive the following error on two different machines:

PTB-ERROR: OpenML timestamping reports that flip completed before its requested target time [Target no earlier than 1531396898.530205 secs, completed at 1531396881.867865 secs]!
PTB-ERROR: Something is wrong with swap scheduling, a misconfiguration or potential graphics driver bug!Check your system setup!
PTB-ERROR: Switching to alternative fallback scheduling method.

This happens on two Linux computers (one PC with Ubuntu 16.04, Sandybridge CPU/GPU, kernel 4.13.0-26-lowlatency, Mesa 17.2.4, PTB build date Oct 3, 2017, Octave 4.0; one laptop with Ubuntu 18.04, Ivybridge CPU/GPU, kernel 4.15.0-23-lowlatency, Mesa 18.0.5, PTB build date Apr 6, 2018, Octave 4.2). It does not happen on a PC with very similar specs but with Nvidia graphics card and Nouveau driver.

Minimal code to reproduce the problem is:
winptr = Screen( 'OpenWindow', 0 )
t0 = Screen( 'Flip', winptr )
t1 = Screen( 'Flip', winptr, t0 + 16.679 )
sca

If I reduce the 16.679 s delay by one ms to 16.678 s no error is perceived (on the laptop; exact limit not tested on the other PC, thus, might be machine specific). The behavior is consistent across reboots and Octave restarts. It did not find any previous mentions of this error on the mailing list. I hope I didn't miss anything obvious.

Any idea? Thank you!
Andreas
Dear Mario,

thank you for the explanation! This was very helpful. We will switch to the Intel driver then. Yes, redundancy is great :)

> Btw. on that machine with Mesa 17.2.4, could you do me a favor and run a script while at least Priority(1) is used? E.g., Priority(1); DotDemo; Priority(0)? Also on the Mesa 18.0.5 machine? And let me know how much RAM those machines have?
I’m away from both machines today. Will test on Monday.

I tested on my MacBook Air (Ivybridge CPU/GPU) with 3.8GiB RAM. Neither with Mesa 17.2.8 nor with 18.0.5 I perceived any freezes or other problems in multiple runs and there where no reports of problems in dmesg.

Thanks! Best,
Andreas

> Am 12.07.2018 um 19:42 schrieb mario.kleiner@... [PSYCHTOOLBOX] <PSYCHTOOLBOX@yahoogroups.com>:
>
>
>
> <widmann@...> wrote :
>
> Dear Mario and list,
>
> -> Hi Andreas,
>
> indeed there is a bug in the modesetting-ddx video driver that is used for Intel graphics chips on both Ubuntu 18.04 LTS and also on some recent versions of Ubuntu 16.04, maybe around 16.04.3 or 16.04.4 LTS? This bug causes any swap scheduled >= 1000 video refresh cycles into the future to execute immediately. Your 16.679 bad vs 16.678 good seconds probably just falls short of 1000 frames ie., 1000 frames vs. 999 frames, given a ~60 Hz display. The bug went unnoticed for quite a while by myself because i usually only frequently test up to 10 seconds into the future.
>
> I found and fixed the bug early May for X-Server 1.20, but current Ubuntu still ships with server 1.19.6:
>
> https://cgit.freedesktop.org/xorg/xserver/commit/?id=73f 0ed2d928afc692ed057eb3d7627328a6e5b12
>
> I didn't get around to submit the same fix for server 1.19 yet, but your report here answers the question if this matters in practice and is worth the effort even for a soon-to-be-retired server version.
>
> One simple way around this is to use XOrgConfCreator + XOrgConfSelector to create a xorg.conf file that selects the intel-ddx video driver instead of the buggy modesetting-ddx video driver -- nice that we have these redundancies. I think you have to ask XOrgConfCreator to let you configure special features, and when it asks you if you want to use the new modesetting-ddx you answer (n)o instead of (y)es or (d)on't care. -> Save -> Select config -> logout -> login.
>
> PTB will also fix the problem itself after screwing up the first such flip when this happens and throwing that error message below, as the swtich to the "alternative fallb ack scheduling method." - So our advanced error detection and correction on Linux works as desired. Triple redundancy is a nice thing.
>
> Btw. on that machine with Mesa 17.2.4, could you do me a favor and run a script while at least Priority(1) is used? E.g., Priority(1); DotDemo; Priority(0)? Also on the Mesa 18.0.5 machine? And let me know how much RAM those machines have? I know of a likely Mesa bug introduced sometime around Mesa 18.0.0 with Intel graphics that causes gfx-chip lockups on some chips with some machines when Priority(1) is used to lock system memory. Symptoms are freezes of multiple seconds during visual stimulation and "gfx chip lockup" messages in "dmesg" output (drm error ... ecode soandso ...Hang in rcs detected, ......, resetting chip ...).
>
> This is a bug with an easy and effective workaround as well, that slipped too long through my testing due to lack of time and suitable hardware, an d tracking it down may be tedious, but could be somewhat less tedious if i can narrow down which Mesa versions are affected/unaffected or how much the amount of installed RAM triggers/prevents it. I know Mesa 18.0.0 has the bug, and Mesa 10.1.3 and maybe 17.0 didn't have it.
>
> thanks,
> -mario
>
>
> is there possibly an upper limit for when timestamps used for Screen('Flip', win, when) on (somewhat older) Intel GPUs? When using timestamps more than about 17 seconds in the future we consistently perceive the following error on two different machines:
>
> PTB-ERROR: OpenML timestamping reports that flip completed before its requested target time [Target no earlier than 1531396898.530205 secs, completed at 1531396881.867865 secs]!
> PTB-ERROR: Something is wrong with swap scheduling, a misconfiguration or potential graphics driver bug!Check your system setup!
> PTB-ERROR: Switching to alternative fallback scheduling method.
>
> This happens on two Linux computers (one PC with Ubuntu 16.04, Sandybridge CPU/GPU, kernel 4.13.0-26-lowlatency, Mesa 17.2.4, PTB build date Oct 3, 2017, Octave 4.0; one laptop with Ubuntu 18.04, Ivybridge CPU/GPU, kernel 4.15.0-23-lowlatency, Mesa 18.0.5, PTB build date Apr 6, 2018, Octave 4.2). It does not happen on a PC with very similar specs but with Nvidia graphics card and Nouveau driver.
>
> Minimal code to reproduce the problem is:
> winptr = Screen( 'OpenWindow', 0 )
> t0 = Screen( 'Flip', winptr )
> t1 = Screen( 'Flip', winptr, t0 + 16.679 )
> sca
>
> If I reduce the 16.679 s delay by one ms to 16.678 s no error is perceived (on the laptop; exact limit not tested on the other PC, thus, might be machine specific). The behavior is consistent across reboots and Octave restarts. It did not find any previous mentions of this error on the mailing list. I hope I didn't miss anything obvious.
>
> Any idea? Thank you!
> Andreas
>
>
>