Limit for flip when timestamp on Linux with Intel GPU?

andreaswidmann · July 12, 2018, 1:03pm

Dear Mario and list,

is there possibly an upper limit for when timestamps used for Screen('Flip', win, when) on (somewhat older) Intel GPUs? When using timestamps more than about 17 seconds in the future we consistently perceive the following error on two different machines:

PTB-ERROR: OpenML timestamping reports that flip completed before its requested target time [Target no earlier than 1531396898.530205 secs, completed at 1531396881.867865 secs]!
PTB-ERROR: Something is wrong with swap scheduling, a misconfiguration or potential graphics driver bug!Check your system setup!
PTB-ERROR: Switching to alternative fallback scheduling method.

This happens on two Linux computers (one PC with Ubuntu 16.04, Sandybridge CPU/GPU, kernel 4.13.0-26-lowlatency, Mesa 17.2.4, PTB build date Oct 3, 2017, Octave 4.0; one laptop with Ubuntu 18.04, Ivybridge CPU/GPU, kernel 4.15.0-23-lowlatency, Mesa 18.0.5, PTB build date Apr 6, 2018, Octave 4.2). It does not happen on a PC with very similar specs but with Nvidia graphics card and Nouveau driver.

Minimal code to reproduce the problem is:
winptr = Screen( 'OpenWindow', 0 )
t0 = Screen( 'Flip', winptr )
t1 = Screen( 'Flip', winptr, t0 + 16.679 )
sca

If I reduce the 16.679 s delay by one ms to 16.678 s no error is perceived (on the laptop; exact limit not tested on the other PC, thus, might be machine specific). The behavior is consistent across reboots and Octave restarts. It did not find any previous mentions of this error on the mailing list. I hope I didn't miss anything obvious.

Any idea? Thank you!
Andreas

mariokleiner · July 12, 2018, 5:42pm

<widmann@...> wrote :

Dear Mario and list,

-> Hi Andreas,

indeed there is a bug in the modesetting-ddx video driver that is used for Intel graphics chips on both Ubuntu 18.04 LTS and also on some recent versions of Ubuntu 16.04, maybe around 16.04.3 or 16.04.4 LTS? This bug causes any swap scheduled >= 1000 video refresh cycles into the future to execute immediately. Your 16.679 bad vs 16.678 good seconds probably just falls short of 1000 frames ie., 1000 frames vs. 999 frames, given a ~60 Hz display. The bug went unnoticed for quite a while by myself because i usually only frequently test up to 10 seconds into the future.

I found and fixed the bug early May for X-Server 1.20, but current Ubuntu still ships with server 1.19.6:

https://cgit.freedesktop.org/xorg/xserver/commit/?id=73f0ed2d928afc692ed057eb3d7627328a6e5b12

I didn't get around to submit the same fix for server 1.19 yet, but your report here answers the question if this matters in practice and is worth the effort even for a soon-to-be-retired server version.

One simple way around this is to use XOrgConfCreator + XOrgConfSelector to create a xorg.conf file that selects the intel-ddx video driver instead of the buggy modesetting-ddx video driver -- nice that we have these redundancies. I think you have to ask XOrgConfCreator to let you configure special features, and when it asks you if you want to use the new modesetting-ddx you answer (n)o instead of (y)es or (d)on't care. -> Save -> Select config -> logout -> login.

PTB will also fix the problem itself after screwing up the first such flip when this happens and throwing that error message below, as the swtich to the "alternative fallback scheduling method." - So our advanced error detection and correction on Linux works as desired. Triple redundancy is a nice thing.

Btw. on that machine with Mesa 17.2.4, could you do me a favor and run a script while at least Priority(1) is used? E.g., Priority(1); DotDemo; Priority(0)? Also on the Mesa 18.0.5 machine? And let me know how much RAM those machines have? I know of a likely Mesa bug introduced sometime around Mesa 18.0.0 with Intel graphics that causes gfx-chip lockups on some chips with some machines when Priority(1) is used to lock system memory. Symptoms are freezes of multiple seconds during visual stimulation and "gfx chip lockup" messages in "dmesg" output (drm error ... ecode soandso ...Hang in rcs detected, ......, resetting chip ...).

This is a bug with an easy and effective workaround as well, that slipped too long through my testing due to lack of time and suitable hardware, and tracking it down may be tedious, but could be somewhat less tedious if i can narrow down which Mesa versions are affected/unaffected or how much the amount of installed RAM triggers/prevents it. I know Mesa 18.0.0 has the bug, and Mesa 10.1.3 and maybe 17.0 didn't have it.

thanks,

-mario

is there possibly an upper limit for when timestamps used for Screen('Flip', win, when) on (somewhat older) Intel GPUs? When using timestamps more than about 17 seconds in the future we consistently perceive the following error on two different machines:

PTB-ERROR: OpenML timestamping reports that flip completed before its requested target time [Target no earlier than 1531396898.530205 secs, completed at 1531396881.867865 secs]!
PTB-ERROR: Something is wrong with swap scheduling, a misconfiguration or potential graphics driver bug!Check your system setup!
PTB-ERROR: Switching to alternative fallback scheduling method.

This happens on two Linux computers (one PC with Ubuntu 16.04, Sandybridge CPU/GPU, kernel 4.13.0-26-lowlatency, Mesa 17.2.4, PTB build date Oct 3, 2017, Octave 4.0; one laptop with Ubuntu 18.04, Ivybridge CPU/GPU, kernel 4.15.0-23-lowlatency, Mesa 18.0.5, PTB build date Apr 6, 2018, Octave 4.2). It does not happen on a PC with very similar specs but with Nvidia graphics card and Nouveau driver.

Minimal code to reproduce the problem is:
winptr = Screen( 'OpenWindow', 0 )
t0 = Screen( 'Flip', winptr )
t1 = Screen( 'Flip', winptr, t0 + 16.679 )
sca

If I reduce the 16.679 s delay by one ms to 16.678 s no error is perceived (on the laptop; exact limit not tested on the other PC, thus, might be machine specific). The behavior is consistent across reboots and Octave restarts. It did not find any previous mentions of this error on the mailing list. I hope I didn't miss anything obvious.

Any idea? Thank you!
Andreas

andreaswidmann · July 13, 2018, 9:16am

Dear Mario,

thank you for the explanation! This was very helpful. We will switch to the Intel driver then. Yes, redundancy is great :)

> Btw. on that machine with Mesa 17.2.4, could you do me a favor and run a script while at least Priority(1) is used? E.g., Priority(1); DotDemo; Priority(0)? Also on the Mesa 18.0.5 machine? And let me know how much RAM those machines have?

I’m away from both machines today. Will test on Monday.

I tested on my MacBook Air (Ivybridge CPU/GPU) with 3.8GiB RAM. Neither with Mesa 17.2.8 nor with 18.0.5 I perceived any freezes or other problems in multiple runs and there where no reports of problems in dmesg.

Thanks! Best,
Andreas

> Am 12.07.2018 um 19:42 schrieb mario.kleiner@... [PSYCHTOOLBOX] <PSYCHTOOLBOX@yahoogroups.com>:
>
>
>
> <widmann@...> wrote :
>
> Dear Mario and list,
>
> -> Hi Andreas,
>
> indeed there is a bug in the modesetting-ddx video driver that is used for Intel graphics chips on both Ubuntu 18.04 LTS and also on some recent versions of Ubuntu 16.04, maybe around 16.04.3 or 16.04.4 LTS? This bug causes any swap scheduled >= 1000 video refresh cycles into the future to execute immediately. Your 16.679 bad vs 16.678 good seconds probably just falls short of 1000 frames ie., 1000 frames vs. 999 frames, given a ~60 Hz display. The bug went unnoticed for quite a while by myself because i usually only frequently test up to 10 seconds into the future.
>
> I found and fixed the bug early May for X-Server 1.20, but current Ubuntu still ships with server 1.19.6:
>
> https://cgit.freedesktop.org/xorg/xserver/commit/?id=73f 0ed2d928afc692ed057eb3d7627328a6e5b12
>
> I didn't get around to submit the same fix for server 1.19 yet, but your report here answers the question if this matters in practice and is worth the effort even for a soon-to-be-retired server version.
>
> One simple way around this is to use XOrgConfCreator + XOrgConfSelector to create a xorg.conf file that selects the intel-ddx video driver instead of the buggy modesetting-ddx video driver -- nice that we have these redundancies. I think you have to ask XOrgConfCreator to let you configure special features, and when it asks you if you want to use the new modesetting-ddx you answer (n)o instead of (y)es or (d)on't care. -> Save -> Select config -> logout -> login.
>
> PTB will also fix the problem itself after screwing up the first such flip when this happens and throwing that error message below, as the swtich to the "alternative fallb ack scheduling method." - So our advanced error detection and correction on Linux works as desired. Triple redundancy is a nice thing.
>
> Btw. on that machine with Mesa 17.2.4, could you do me a favor and run a script while at least Priority(1) is used? E.g., Priority(1); DotDemo; Priority(0)? Also on the Mesa 18.0.5 machine? And let me know how much RAM those machines have? I know of a likely Mesa bug introduced sometime around Mesa 18.0.0 with Intel graphics that causes gfx-chip lockups on some chips with some machines when Priority(1) is used to lock system memory. Symptoms are freezes of multiple seconds during visual stimulation and "gfx chip lockup" messages in "dmesg" output (drm error ... ecode soandso ...Hang in rcs detected, ......, resetting chip ...).
>
> This is a bug with an easy and effective workaround as well, that slipped too long through my testing due to lack of time and suitable hardware, an d tracking it down may be tedious, but could be somewhat less tedious if i can narrow down which Mesa versions are affected/unaffected or how much the amount of installed RAM triggers/prevents it. I know Mesa 18.0.0 has the bug, and Mesa 10.1.3 and maybe 17.0 didn't have it.
>
> thanks,
> -mario
>
>
> is there possibly an upper limit for when timestamps used for Screen('Flip', win, when) on (somewhat older) Intel GPUs? When using timestamps more than about 17 seconds in the future we consistently perceive the following error on two different machines:
>
> PTB-ERROR: OpenML timestamping reports that flip completed before its requested target time [Target no earlier than 1531396898.530205 secs, completed at 1531396881.867865 secs]!
> PTB-ERROR: Something is wrong with swap scheduling, a misconfiguration or potential graphics driver bug!Check your system setup!
> PTB-ERROR: Switching to alternative fallback scheduling method.
>
> This happens on two Linux computers (one PC with Ubuntu 16.04, Sandybridge CPU/GPU, kernel 4.13.0-26-lowlatency, Mesa 17.2.4, PTB build date Oct 3, 2017, Octave 4.0; one laptop with Ubuntu 18.04, Ivybridge CPU/GPU, kernel 4.15.0-23-lowlatency, Mesa 18.0.5, PTB build date Apr 6, 2018, Octave 4.2). It does not happen on a PC with very similar specs but with Nvidia graphics card and Nouveau driver.
>
> Minimal code to reproduce the problem is:
> winptr = Screen( 'OpenWindow', 0 )
> t0 = Screen( 'Flip', winptr )
> t1 = Screen( 'Flip', winptr, t0 + 16.679 )
> sca
>
> If I reduce the 16.679 s delay by one ms to 16.678 s no error is perceived (on the laptop; exact limit not tested on the other PC, thus, might be machine specific). The behavior is consistent across reboots and Octave restarts. It did not find any previous mentions of this error on the mailing list. I hope I didn't miss anything obvious.
>
> Any idea? Thank you!
> Andreas
>
>
>

Topic		Replies	Views
"didn't use pageflipping for flip"	2	812	July 14, 2013
Odd flip timestamps	1	421	January 4, 2017
Monitor refresh interval problem - due to ignoring professional advice Hardware windows , gpu	3	313	August 15, 2023
PTB-WARNING: Flip for window 10 didn't use pageflipping for flip. Visual presentation timing and timestamps are likely unreliable! Programming Help windows	1	20	December 13, 2024
No pageflipping on Ubuntu 24.04-LTS with Ubuntu/Gnome desktop GUI Bugs & Features linux	3	153	August 29, 2024

Limit for flip when timestamp on Linux with Intel GPU?

Related topics