You are here›Research Topics›Communication›Blog›Beyond HTML5 - Conversational Voice and Video Implemented in WebKit GTK+
Beyond HTML5 - Conversational Voice and Video Implemented in WebKit GTK+
The last blog post in our Beyond HTML5-series, Beyond HTML5 - Implementing
Device Element and Stream API Improvements
Before going into the newly implemented features, we should mention some improvements to the components demonstrated in the previous post. To start with, the device element now supports the type "media", in addition to "audio and "video", which allows the user to select input devices for audio and/or video. The result is a stream that may contains both audio and video components. A URL fragment identifier, from the File API specification, can be used to single out a specific component of an aggregated stream. The video chat example code below uses a fragment identifier "#video" to avoid playing back the recorded audio in the self-view.
MediaStreamTransceiver and WebSocket Transport
What we need to extend the self-view example to a simple video chat web application is a way to share your local audio-visual stream, and receive the corresponding stream from the remote party. This is exactly what the MediaStreamTransceiver does. The MediaStreamTransceiver is constructed with a websocket URL to the media relay server (see next section) and has two stream properties - one for the stream you want to transmit (localStream) and one for the incoming stream (remoteStream). Once the connected event is fired you can play the remote stream in, e.g., a video element.
At an earlier planning stage the MediaStreamTransceiver used a separately constructed websocket as transport. However, that approach was abandoned in favor of internal websockets to protect the channels and avoid having to multiplex media data and data posted using an external websocket handle. Another benefit of hiding the transport details is that the developer does not have to worry about how many websocket channels is needed to transport a stream. In our current implementation we use one websocket channel per stream component.
For this demo we use a simple websocket server as a media relay. It works by connecting clients, using the same session identifier (sid), to each other and forwarding media data. The websocket-protocol header field is used to identify the content of a websocket channel, i.e. the media component type, to connect it to the corresponding channel on the recipient side.
Demo - A Simple Video Chat
The HTML code below is based on the self-view example from the previous blog post, but has been extended to support video chat.
The video chat web application is basically the self view example with two additions. First, a MediaStreamTransceiver instance to share your local stream with a remote party, and to receive a remote media stream. Second, the addition of a second video element to play the remote media stream. The video below shows the demo web application in action.
Summary and Conclusions
We have demonstrated conversational voice and video in WebKit GTK+. To accomplish this, we have made the following modifications:
- Added MediaStreamManager to map Stream URLs to the corresponding pipeline in the media backend
- Added MediaStreamTransceiver to control the related media processing and transport
- Added support for binary data in the WebSocket protocol
That is it! No modifications has been made to the media framework or the underlying OS. We have tested our implementation on a couple of different device types and Linux distributions, and the performance (while not assessed in any formal test) seems quite OK. We are well aware of the drawbacks of using TCP transport for real-time media streams - perhaps RTP/UDP transport should be considered.
Our conclusion is that it would be feasible to support conversational voice and video in a web browser/runtime. Note again that the example code above is run in our patched version of WebKit GTK+ and it can not be expected to work in a regular web browser.
- Adam Bergkvist, Nicklas Sandgren, Stefan Håkansson, Jonas Lundberg and Per-Erik Brodin