I am thinking of having some audio as a stimulus. Say I want to play the audio and ask the participant to repeat the audio. The audio will have a known length, say 2seconds. However, I am also interested in how long it takes for the participants to repeat the audio. Is there a way can do it with psychtoolbox? I was thinking sync the mic? But not too sure how this would work…
I think the problems I am having now is that the subject’s responses vary (due to different people), I am not to sure how to even mark it as a reaction time…
Processing the audio is outside the scope of this forum. Perhaps there are tools for this online, in the worst case you’ll have to do it manually. Ideally you wouldn’t use speech as a response modality unless thats critical to your experiment
My first google hit is this open access article about a free to use Matlab toolbox from 2016:
A way more fancy approach than simple thresholding. This is in the domain of offline post-processing though, so you’d have to combine our audio capture timestamps for recorded wav files with the delay computed by such an algorithm.
Thanks for the paper. I tried and it gives me one number, I am guessing that its the onset latency. Have not figured out the offset yet…Any thought on this?..
Meanwhile, I also tried using GetSecs to compute the timing. I put time_onset=GetSecs; before the audio plays and after the audio plays. However, to me, its only the script running time rather than the actual subject’s response. Any thought on this?..
You need to think a little harder yourself about this. The time of audio playing is not going to give you subject response of course, because that is a separate thing. Regarding the first question, see if you can adapt their approach to detect offset from the recorded audio, or otherwise google for speech offset detection or something like that.
Is it legit to approximate the response time based on the actual sound file? Say the sound file is 1 s, could I assume that the subject will take about 1 s to repeat that sound?
I am thinking for the onset and offset, it should be the time_onset=GetSecs-timestart (before audio plays) ; and time_offnset=GetSecs-timestart (after audio plays)?
GetSecs is almost always the wrong approach when visual or auditory stimulation is involved.
PsychPortAudio provides you with audio onset and offset timestamps, as optional return arguments of the ‘Start’ [‘startTime’] and ‘Stop’ [‘estStopTime’] functions. When capturing sound it also provides such timestamps, e.g., as part of the ‘GetAudioData’ function.
BasicSoundInputDemo.m shows how to get a sound onset timestamp, based on the simple tresholding approach, called ‘tOnset’ in that demo, and a sound vector ‘recordedaudio’ that it also optionally writes to a .wav file. If you feed that into some fancy algorithm and it tells you voice onset was dT seconds into the sound file, then:
Obviously you could calculate (tOnset - estStopTime) and (tOnset - startTime) already at the end of a trial and save that “preliminary” RT to a file, and then after offline voice processing with “fancy Algorithm™” add dT to the stored value to get final RT.
Essentially you use our drivers timestamping and the simple tresholding with a rather low threshold to get the rough response time, and then compute the additional offset to add with fancy method.
At least that’s how i would do it. This all depends on the precision of our audio drivers timestamps, and that depends on the underlying operating system and sound hardware. One should always verify that at least once with independent measures, but as far as my testing over many years goes: On Linux and macOS with built-in sound chips those timestamps are typically sub-millisecond accurate (typically < 0.6 msecs), on Windows-10 with onboard HDA sound, it can also be about millisecond accurate, older Windows versions were worse. So all in all, with errors in the ~ 1 msec range, most of your error will be produced by “fancy voice onset detection” algorithm. Of course you mileage may vary if you’d choose unsuitable audio hardware.
Thanks for the response! I did not have time lately! Sorry about that. I took a look again at the example BasicSoundInputDemo.m. That example shows what I need (partially). For my stimulus, I already use sound() function and audiowrite to achieve what I want. I was really hoping to combine the psychtoolbox to achieve the onset and offset timestamp for each stimulus? My stimulus contains a series of audio and I play them in a for loop for each trail. I was just wondering how to implement your suggestions of these two lines of code to mine:
Based on the example, by using PsychPortAudio, I achieved something like this when I was testing.
Audio capture started, press any key for about 1 second to quit.
This is some status output of PsychPortAudio:
Active: 1
State: 2
RequestedStartTime: 0
StartTime: 9.8742e+04
CaptureStartTime: 9.8742e+04
RequestedStopTime: 1.7977e+308
EstimatedStopTime: 0
CurrentStreamTime: 0
ElapsedOutSamples: 0
PositionSecs: 0
RecordedSecs: 2.0800
ReadSecs: 1.0200
SchedulePosition: 0
XRuns: 0
TotalCalls: 208
TimeFailed: 0
BufferSize: 480
CPULoad: 0.0053
PredictedLatency: 0.0100
LatencyBias: 0
SampleRate: 48000
OutDeviceIndex: -1
InDeviceIndex: 5
However, I am more interested in onset and offset. So figure this may help
idx = min(find(abs(audiodata(1,:)) >= voicetrigger)); %#ok
tOnset = tCaptureStart + ((offset + idx - 1) / freq);
I will try to combine this into mine. Do you have suggestions on whether this would be possible to achieve my goal? Thanks
As the help text for GetAudioData explains, tCaptureStart is the time when audio recording started, and offset is the offset of the returned data vector relative to that start time in sample frames. idx is when the minimum loudness threshold was exceeded, also in sample frames. So (offset+idx-1)/freq translates that sample frame offset into a time offset to add to capture start time, and you get absolute time of “voice onset” for that very simple thresholding. On top of that you would use the fancy toolbox to compute dT in my post above, relative to the start of the sound vector you stored, to get a more accurate voice onset time.
startTime or estStopTime would be the values returned by PsychPortAuio(‘Start’) or PsychPortAudio(‘Stop’) for the playback code you need to write for playing your sound stimulus with PsychPortAudio. tOnset is the one from BasicSoundInputDemo.m, dT is what that fancy toolbox would give you.
You must use PsychPortAudio for everything, also for playback, otherwise any playback timing or timestamps will be inaccurate. Matlab’s sound() will not cut it.
Once you’ve done that, you will have tOnset, startTime, estStopTime at the end of your trial, so you can already compute a “preliminary” sound reaction time (tSonset - estStopTime) etc. and store that away with the wav file of the recorded sound. Then - offline i assume, if this is somewhat compute intense - you can feed that file into fancy toolbox, get dT from it, and add it to the “preliminary” RT you already have, to get the final voiceResponseDelaySinceAudioOffset or voiceResponseDelaySinceAudioOnset that you actually wanted.
To open the same sound card in full-duplex mode to play back sounds and capture voice responses after each played sound, you’d open the card as a master device, then create (PsychPortAudio(‘OpenSlave’,…)) two slave devices, one for playback, the other for capture, attached to the master. This way you can control playback and recording independently. Use of such slave devices is demonstrated, e.g., in BasicAMAndMixScheduleDemo.m.
You could also just open the soundcard for playback, play the sound and get the timestamps, then close it, then open the soundcard for recording, record the sound and get the timestamps, then close it. In each trial. As long as subjects don’t respond within milliseconds of end of playback, there should be enough time to close the playback device and get recording set up. But the approach with master and slave devices is more elegant and eliminates such gaps.
I’m afraid though, we don’t have a specific demo for your use case of “play a sound, give timestamps, then record sound with timestamps”, so you’ll have to string this together yourself from existing demos. Or wait until someone finds the time to write a demo.