You are here
›Research Topics›Communication›Blog›Beyond HTML5 - Full-duplex Conversational Voice and Video in Browsers and Web RuntimesCommunication
Beyond HTML5 - Full-duplex Conversational Voice and Video in Browsers and Web Runtimes
We all know that audio and video can enhance communication. However, today you can not in a simple way add it to your web application. You need to rely on the user to install a plug-in (Flash, GTalk), or redirect the user to another application (e.g. telephony or Skype). But if the browser would have native support for live audio and video capture and playout as well as stream management and transport it would be possible without having to rely on plug-ins. Maybe this is not so far away from being possible to realize. You can already today play out files using the html5 <audio> and <video> elements in many browsers. As described in the post Beyond HTML5: Audio Capture in Web Browsers a new element, <device>, has been suggested to allow the user to grant a web application access to devices such as web cameras and microphones. It can be used in combination with the Stream APIfor e.g. audio and video capturing. And finally Web Sockets offer a mean for transport of media.
Since the best way to learn more about the possibilities and limitations with new technology is to implement and experiment we've decided to, as part of a research activity, make a test implementation. We will take a shot at implementing
-
support for access to audio and video capture devices using
<device> - support for streams in the browser with the Stream API to do e.g. capturing
- support for sending and receiving media streams by introducing new JavaScript components
- support for stream transfer using the WebSocket protocol
Our goal is to demonstrate full-duplex audiovisual communication between two browsers. In the process we will learn more about any obstacles and about what kind of performance you can expect. Our demo setup is shown below. As can be seen, there will be a media relay node involved to simplify NAT/FW traversal. Added benefits are that it would be the natural place to introduce e.g. media mixing (for multiparty sessions) or media transcoding (if required).

We will use WebKit for our work. WebKit is a cross-platform project and the various ports uses different media engines to implement media capabilities. We have chosen the GTK+ port of WebKit as our primary prototyping port which uses GStreamer as its media engine. WebKit has the advantage that it is easily available (just download it from WebKit.org). Another advantage is that WebKit is the core of many browsers (Safari, Chrome, Android Browser, Symbian S40 and S60 browsers), which means that if the community thinks that our ideas make sense there could be a path into real browsers.
The level of, or degree of freedom offered by, the stream control APIs available to the web developer could be discussed. Our reasoning is that the APIs should be at a quite low level, leaving freedom and control to the web developer. Attaching a camera could be the start of a visual communication session, but it could also be something else that the developer has in mind, like locally recording a video clip for a video blog or YouTube. This means that the developer should be in charge of what to do with the captured streams.
It could look something like the following html body
<p> Select audio/video device: <device type="video_capture, audio_capture" id="audio_video_device"/> </p> <video id="video_player" width="640" height="480"> <input type="button" id="connect_but" value="Connect" disabled="true"></input>
that basically forces the user to select devices for audio and video capture (this is part of the security model behind the <device> element - without active user selection no audiovisual capturing) and displays a "Connect" button. Combined with this JavaScript logic
var captureStream; // Stream var videoPlayer; var connectBut; window.onload = function () { // triggered when the user has granted access to the capturing devices document.getElementById("audio_video_device").onchange = function () { captureStream = this.data; // ready to connect connectBut.disabled = false; }; // used to play audio and video from the remote party videoPlayer = document.getElementById("video_player"); connectBut = document.getElementById("connect_but"); connectBut.onclick = connect; }; function connect() { // create a media stream transceiver var transceiver = new MediaStreamTransceiver(); // set up the transport transceiver.transport = new WebSocket("URI of receiver"); // note: another possibility could be that the transport is set up as // part of the media stream transceiver creation to exclude other // traffic than media on this WebSocket // set the stream from the audio/video capture devices as local stream transceiver.localStream = captureStream; transceiver.onconnect = function (evt) { // render the remote stream videoPlayer.src = transceiver.remoteStream.URL; }; }
the user can by clicking on "Connect" connect audiovisually with another client. We have introduced something called "MediaStreamTransceiver" to handle stream processing.
Of course this example is oversimplified. Presumably the user would like a self-view, and probably also the possibility to use audio only for communication. Additionally the session control signaling is left out - the "URI of receiver" is magically known. But given the low level of the APIs it would be very simple to add a self-view - just display the captureStream in another <video> element - and to allow switching between audio only and audiovisual. And session control can easily be supported by out of band signaling.
Our plan is to divide the work in stages, something like
-
Implementing stream (as opposed to file) handling, play streams on
<audio>and<video> -
Implementing stream capture (with
<device>and Stream API), locally play out what is captured (self-view) Update: A post on this is now published! - Implementing WebSocket transport, play out something that is captured on another device
Our next post will follow soon!
--Stefan Håkansson, Adam Bergkvist, Nicklas Sandgren



Comments
I want information on how to use webkit gtk+ cloud device to test