1. Introduction
This section is non-normative
This is a proposal to add a requestVideoFrameCallback()
method to the HTMLVideoElement
.
This method allows web authors to register a callback
which runs in the rendering steps, when a new video frame is sent to the compositor. The
new callbacks
are executed immediately before existing window.requestAnimationFrame()
callbacks. Changes made from within both
callback types within the same turn of the event loop will be visible on
screen at the same time, with the next v-sync.
Drawing operations (e.g. drawing a video frame to a canvas
via drawImage()
) made through this
API will be synchronized as a best effort with the video playing on screen. Best effort in this case
means that, even with a normal work load, a callback
can occasionally be
fired one v-sync late, relative to when the new video frame was presented. This means that drawing
operations might occasionally appear on screen one v-sync after the video frame does. Additionally, if
there is a heavy load on the main thread, we might not get a callback for every frame (as measured by a
discontinuity in the presentedFrames
).
Note: A web author could know if a callback is late by checking whether expectedDisplayTime
is equal
to now, as opposed to roughly one v-sync in the future.
The VideoFrameRequestCallback
also provides useful metadata
about the video
frame that was most recently presented for composition, which can be used for automated metrics analysis.
2. VideoFrameMetadata
dictionary {
VideoFrameMetadata required DOMHighResTimeStamp presentationTime ;required DOMHighResTimeStamp expectedDisplayTime ;required unsigned long width ;required unsigned long height ;required double mediaTime ;required unsigned long presentedFrames ;double processingDuration ;DOMHighResTimeStamp captureTime ;DOMHighResTimeStamp receiveTime ;unsigned long rtpTimestamp ; };
2.1. Definitions
media pixels are defined as a media resource’s visible decoded pixels, without pixel aspect ratio adjustments. They are different from CSS pixels, which account for pixel aspect ratio adjustments.
2.2. Attributes
presentationTime
, of type DOMHighResTimeStamp-
The time at which the user agent submitted the frame for composition.
expectedDisplayTime
, of type DOMHighResTimeStamp-
The time at which the user agent expects the frame to be visible.
width
, of type unsigned long-
The width of the video frame, in media pixels.
height
, of type unsigned long-
The height of the video frame, in media pixels.
Note: width
and height
might differ from videoWidth
and videoHeight
in certain cases (e.g, an anamorphic video might
have rectangular pixels). When a calling texImage2D()
, width
and height
are the dimensions used to copy the video’s media pixels to the texture,
while videoWidth
and videoHeight
can
be used to determine the aspect ratio to use, when using the texture.
mediaTime
, of type double-
The media presentation timestamp (PTS) in seconds of the frame presented (e.g. its timestamp on the
video.currentTime
timeline). MAY have a zero value for live-streams or WebRTC applications. presentedFrames
, of type unsigned long-
A count of the number of frames submitted for composition. Allows clients to determine if frames were missed between
VideoFrameRequestCallback
s. MUST be monotonically increasing. processingDuration
, of type double-
The elapsed duration in seconds from submission of the encoded packet with the same presentation timestamp (PTS) as this frame (e.g. same as the
mediaTime
) to the decoder until the decoded frame was ready for presentation.In addition to decoding time, may include processing time. E.g., YUV conversion and/or staging into GPU backed memory.
SHOULD be present. In some cases, user-agents might not be able to surface this information since portions of the media pipeline might be owned by the OS.
captureTime
, of type DOMHighResTimeStamp-
For video frames coming from a local source, this is the time at which the frame was captured by the camera. For video frames coming from remote source, the capture time is based on the RTP timestamp of the frame and estimated using clock synchronization. This is best effort and can use methods like using RTCP SR as specified in RFC 3550 Section 6.4.1, or by other alternative means if use by RTCP SR isn’t feasible.
SHOULD be present for WebRTC or getUserMedia applications, and absent otherwise.
receiveTime
, of type DOMHighResTimeStamp-
For video frames coming from a remote source, this is the time the encoded frame was received by the platform, i.e., the time at which the last packet belonging to this frame was received over the network.
SHOULD be present for WebRTC applications that receive data from a remote source, and absent otherwise.
rtpTimestamp
, of type unsigned long-
The RTP timestamp associated with this video frame.
SHOULD be present for WebRTC applications that receive data from a remote source, and absent otherwise.
3. VideoFrameRequestCallback
callback =
VideoFrameRequestCallback undefined (DOMHighResTimeStamp ,
now VideoFrameMetadata );
metadata
Each VideoFrameRequestCallback
object has a canceled boolean initially set to false.
4. HTMLVideoElement.requestVideoFrameCallback()
partial interface HTMLVideoElement {unsigned long requestVideoFrameCallback (VideoFrameRequestCallback );
callback undefined cancelVideoFrameCallback (unsigned long ); };
handle
4.1. Methods
Each HTMLVideoElement
has a list of video frame request callbacks, which is initially
empty. It also has a last presented frame indentifier and a video frame request
callback identifier, which are both numbers which are initially zero.
requestVideoFrameCallback(callback)
-
Registers a callback to be fired the next time a frame is presented to the compositor.
When
requestVideoFrameCallback
is called, the user agent MUST run the following steps:-
Let video be the
HTMLVideoElement
on whichrequestVideoFrameCallback
is invoked. -
Increment video’s
ownerDocument
's video frame request callback identifier by one. -
Let callbackId be video’s
ownerDocument
's video frame request callback identifier -
Append callback to video’s list of video frame request callbacks, associated with callbackId.
-
Return callbackId.
-
cancelVideoFrameCallback(handle)
-
Cancels an existing video frame request callback given its handle.
When
cancelVideoFrameCallback
is called, the user agent MUST run the following steps:-
Let video be the target
HTMLVideoElement
object on whichcancelVideoFrameCallback
is invoked. -
Find the entry in video’s list of video frame request callbacks that is associated with the value handle.
-
If there is such an entry, set its canceled boolean to
true
and remove it from video’s list of video frame request callbacks.
-
4.2. Procedures
An HTMLVideoElement
is considered to be an associated video element of a Document
doc if its ownerDocument
attribute is the same as doc.
This spec should eventually be merged into the HTML spec, and we should directly call run the video frame request callbacks from the update the rendering steps. This procedure describes where and how to invoke the algorithm in the meantime.
When the update the rendering algorithm is invoked, run this new step:
-
For each fully active
Document
in docs, for each associated video element for thatDocument
, run the video frame request callbacks passing now as the timestamp.
immediately before this existing step:
-
"For each fully active
Document
in docs, run the animation frame callbacks for thatDocument
, passing in now as the timestamp"
using the definitions for docs and now described in the update the rendering algorithm.
Note: The effective rate at which callbacks
are run is the lesser rate
between the video’s rate and the browser’s rate. When the video rate is lower than the browser rate,
the callbacks
' rate is limited by the frequency at which new frames are
presented. When the video rate is greater than the browser rate, the callbacks
' rate is limited by the frequency of the update the
rendering steps. This means, a 25fps video playing in a browser that paints at 60Hz would fire
callbacks at 25Hz; a 120fps video in that same 60Hz browser would fire callbacks at 60Hz.
To run the video frame request callbacks for a HTMLVideoElement
video with a timestamp now, run the following steps:
-
If video’s list of video frame request callbacks is empty, abort these steps.
-
Let metadata be the
VideoFrameMetadata
dictionary built from video’s latest presented frame. -
Let presentedFrames be the value of metadata’s
presentedFrames
field. -
If the last presented frame indentifier is equal to presentedFrames, abort these steps.
-
Set the last presented frame indentifier to presentedFrames.
-
Let callbacks be the list of video frame request callbacks.
-
Set video’s list of video frame request callbacks to be empty.
-
For each entry in callbacks
-
If the entry’s canceled boolean is
true
, continue to the next entry. -
Invoke the callback, passing now and metadata as arguments
-
If an exception is thrown, report the exception.
-
Note: There are no strict timing guarantees when it comes to how soon callbacks
are run after a new video frame has been presented.
Consider the following scenario: a new frame is presented on the compositor thread, just as the user
agent aborts the algorithm above, when it confirms that
there are no new frames. We therefore won’t run the callbacks
in the current rendering steps, and have to wait until the next rendering steps, one v-sync later. In that case, visual changes to a web page made from
within the delayed callbacks
will appear on-screen one v-sync after the
video frame does.
Offering stricter guarantees would likely force implementers to add cross-thread synchronization, which might be detrimental to video playback performance.
5. Security and Privacy Considerations
This specification does not expose any new privacy-sensitive information. However, the location
correlation opportunities outlined in the Privacy and Security section of [webrtc-stats] also hold
true for this spec: captureTime
, receiveTime
, and rtpTimestamp
expose network-layer
information which can be correlated to location information. E.g., reusing the same example, captureTime
and receiveTime
can be used to estimate network end-to-end travel time, which can
give indication as to how far the peers are located, and can give some location information about a peer
if the location of the other peer is known. Since this information is already available via the RTCStats, this specification doesn’t introduce any novel privacy considerations.
This specification might introduce some new GPU fingerprinting opportunities. processingDuration
exposes some under-the-hood performance information about the video pipeline, which is otherwise
inaccessible to web developers. Using this information, one could correlate the performance of various
codecs and video sizes to a known GPU’s profile. We therefore propose a resolution of 100μs, which is
still useful for automated quality analysis, but doesn’t offer any new sources of high resolution
information. Still, despite a coarse clock, one could exploit the significant performance differences
between hardware and software decoders to infer information about a GPU’s features. For example, this
would make it easier to fingerprint the newest GPUs, which have hardware decoders for the latest
codecs, which don’t yet have widespread hardware decoding support. However, rather than measuring the
profiles themselves, one could directly get equivalent information from getting the MediaCapabilitiesInfo
.
This specification also introduces some new timing information. presentationTime
and expectedDisplayTime
expose compositor timing information; captureTime
and receiveTime
expose network timing information. The clock resolution of these fields should
therefore be coarse enough not to facilitate timing attacks.
6. Examples
6.1. Drawing frames at the video rate
This section is non-normative
Drawing video frames onto a canvas
at the video rate (instead of the browser’s animation rate)
can be done by using video.requestVideoFrameCallback()
instead of window.requestAnimationFrame()
.
< body> < video controls>< /video>< canvas width= "640" height= "360" >< /canvas>< span id= "fps_text" /> < /body>< script> function startDrawing() { var video= document. querySelector( 'video' ); var canvas= document. querySelector( 'canvas' ); var ctx= canvas. getContext( '2d' ); var paint_count= 0 ; var start_time= 0.0 ; var updateCanvas= function ( now) { if ( start_time== 0.0 ) start_time= now; ctx. drawImage( video, 0 , 0 , canvas. width, canvas. height); var elapsed= ( now- start_time) / 1000.0 ; var fps= ( ++ paint_count/ elapsed). toFixed( 3 ); document. querySelector( '#fps_text' ). innerText= 'video fps: ' + fps; video. requestVideoFrameCallback( updateCanvas); } video. requestVideoFrameCallback( updateCanvas); video. src= "http://example.com/foo.webm" video. play() } < /script>