Does Screen('Flip') work by queueing up all draw commands since last flip?

I would like to ask a question about how Screen('Flip') works since it's behaving a bit differently than I expected. In tests I'm finding that the length of time Flip waits (or takes) to execute is dependent on how many times the screen buffer is drawn to between flips, even if those draw commands are overwriting the same pixels. (To be clear: this is not how long it takes to call the draw commands, rather how long it takes to call just the Flip command itself.)


I always thought of the screen buffer as sort of a texture that draw commands take time to draw to and then when you call Flip it just takes copies the current state of each px in that "texture" to screen. If so, I thought it wouldn't matter to Flip how many times a given px has been drawn to.


However, what I'm getting in tests is more consistent with draw commands only being queued up when called and then later sequentially executed by Flip itself, even if they sequentially overwrite the same pixels. e.g. if I draw a square using FillRect once between each flip and then time how long Flip takes I get about 16 ms, i.e. the length of a refresh interval on my 60 Hz screen, as expected (most of Flip's time is spent waiting for next refresh before flipping and returning). But if I call FillRect 100 times (on the same pixels) between each flip, suddenly Flip itself (not the FillRect commands) takes about 31 ms to execute, i.e. a whole two refresh intervals. For 200 FillRects between each flip Flip takes about 46 ms. And so on.


Am I right that draw commands queue up and are sequentially executed by Flip--or something along those lines? If so, a follow-up question is: is there a way to simply and quickly clear/reset Flip's buffer without actually flipping it to screen?


Thank-you!

please post your hardware and software specs, and your testing code.
Sure, I'm on a 2012 MacBook Pro (non-retina display) with integrated Intel graphics. MATLAB 2015b and PTB version 3.0.13 - Flavor: beta. The following code shows the behaviour on my machine. I get median times to execute Flip of about 16, 47, 78, and 138 ms respectively (these numbers will probably vary on other machines depending on their refresh interval and speed, but hopefully the general increase in time to execute Flip will still hold).


w = Screen('OpenWindow', 0, [0 0 0]);

 

 

'1 fillrect draw between flips'

dd1 = zeros(1,100);

for i = 1:100

    Screen('FillRect', w, [100 100 100]);

    t1 = GetSecs;

    t2 = Screen('Flip', w);

    dd1(i) = t2-t1;

end

median(dd1)

 

 

'200 fillrect draws between flips'

dd200 = zeros(1,100);

for i = 1:100

    for j = 1:200

        Screen('FillRect', w, [100 100 100]);

    end

    t1 = GetSecs;

    t2 = Screen('Flip', w);

    dd200(i) = t2-t1;

end

median(dd200)

 

 

'400 fillrect draws between flips'

dd400 = zeros(1,100);

for i = 1:100

    for j = 1:400

        Screen('FillRect', w, [100 100 100]);

    end

    t1 = GetSecs;

    t2 = Screen('Flip', w);

    dd400(i) = t2-t1;

end

median(dd400)

 

 

'800 fillrect draws between flips'

dd800 = zeros(1,100);

for i = 1:100

    for j = 1:800

        Screen('FillRect', w, [100 100 100]);

    end

    t1 = GetSecs;

    t2 = Screen('Flip', w);

    dd800(i) = t2-t1;

end

median(dd800)

 

 

sca

Oh and OSX El Capitan. Forgot to say that. Thanks.
Mario will have more definitive information, however i would guess that the behaviour you are seeing is due to the crappy integrated graphics in your machine (is it a 13" MBP with Intel 4000 graphics?), or a video driver bug.

are you using any external monitors? how much RAM does your machine have?

and do you have the PsychToolboxKernelDriver installed?
Thanks! Yep, that's right. 13", Intel 4000. This one has 10 GB RAM and a solid state hard drive. Whether or not I attach a second screen doesn't affect the results I get.

If I put a Screen('DrawingFInished') followed by a sufficient WaitSecs before the Flip call the slowness of Flip goes away. However, not if I put just a WaitSecs without a Screen('DrawingFinished'). My (amateur) guess is that the FillRect commands stack up in a queue and that queue does not start to execute until *either* DrawingFinished or Flip is called.
my machine does not show that behaviour, so it's either a limitation of your machine, or a bug.
This is greatly simplified, but as a mental model: When you execute Screen or OpenGL drawing commands, the graphics driver translates those into low-level commands which your graphics hardware can understand and then submits them into a command queue. At the discretion of the driver, command buffers from that queue get transmitted to the hardware into another hardware queue, from which the gpu then pulls and processes commands at its own pace. In that sense the drawing commands are "queued" after processing by the graphics driver in memory and then sent to the gpu's own 2nd queue for parallel background processing when the driver decides to do so.

The driver will submit batches of commands to the gpu when the in-memory queue is filled to some threshold and the hardware queue has enough space. It will usually also submit when the queue is partially full, but the driver decides it is efficient to do so. Or after some driver dependent timeout if the queue is only filled with a few commands, too little for efficient operation. Screen('Drawingfinished') or a Screen('Flip') under certain circumstances, e.g., when you specify a Flip 'when' target time in the future, will cause PTB to execute a glFlush() command to forcibly flush the software OpenGL command queue. At the latest, the driver will flush the queue when a Screen('Flip') is pending, as obviously all drawing work must complete before a new drawn image can be flipped onto the scanout.

Most of the details of this behaviour is highly os/driver/gpu dependent, but in general this means you can't time the execution of Screen() or glXXX drawing command via tic; toc; or GetSecs, as you are only measuring command submission time on the cpu, not the actual rendering work on the gpu. Our DrawingSpeedTest script has an optional flag 'gpuMeasurement' you can set to 1 to measure actual execution time on the gpu. Obviously also if you implement that code in your own script.

The numbers you get for a Intel HD-4000 Ivybridge are expected for the workload you submit to such a gpu with a theoretical maximum memory bandwidth of 25 GB/sec.

Oh and in general, if you overdraw a pixel n times, then the GPU will execute the drawing n times, although there are tons of optimizations in modern drivers and hardware that will avoid redundant operations, or at least reduce their execution cost, under certain circumstances. Most of these optimizations apply to typical 3D rendering with a depth buffer though, not to pure 2D drawing as you'd typically do with Screen() drawing commands.

One example of such an optimization relating to your script: PTB will translate a FillRect for the complete onscreen window into a glClear() call under many - but not all - circumstances, as glClear is potentially optimized by the hardware to be faster than just filling a "normal" rectangle with pixels of a certain color. Graphics drivers may optimize that glClear further under suitable circumstances. E.g., the Intel graphics driver on Linux will execute a fast-clear on modern Intel graphics chips like yours, iff the clear color is selected to clear to either black (r,g,b = 0,0,0) or white, to speed up the operation even more. It will also detect that you are clearing the framebuffer redundantly with the same color and only execute your first FillRect, discard all others as redundant. The Intel graphics driver on all versions of OSX isn't so clever and executes a slower clear on the same hardware, and doesn't detect redundancy. Performance can be highly OS dependent this way.

Hope it helps,
-mario

Perfect, yes, thanks! Actually I had started to piece some of this together from some of the slides in the PTB 3 slideshow that is included with PTB. Unfortunately, the most relevant looking slides were the ones with only pictures and no text, so I still wasn't sure. This clears it up. Now that I know how it works underneath I've coded it a different way so it's no longer a problem.

Thanks!