You are here
›Research Topics›Communication›Blog›Beyond HTML5 - Conversational Voice and Video Implemented in WebKit GTK+Communication
Beyond HTML5 - Conversational Voice and Video Implemented in WebKit GTK+
The last blog post in our Beyond HTML5-series, Beyond HTML5 - Implementing
Device Element and Stream API Improvements
Before going into the newly implemented features, we should mention some improvements to the components demonstrated in the previous post. To start with, the device element now supports the type "media", in addition to "audio and "video", which allows the user to select input devices for audio and/or video. The result is a stream that may contains both audio and video components. A URL fragment identifier, from the File API specification, can be used to single out a specific component of an aggregated stream. The video chat example code below uses a fragment identifier "#video" to avoid playing back the recorded audio in the self-view.

The device selector for type "media" allows the user
to select both audio and video devices
MediaStreamTransceiver and WebSocket Transport
What we need to extend the self-view example to a simple video chat web application is a way to share your local audio-visual stream, and receive the corresponding stream from the remote party. This is exactly what the MediaStreamTransceiver does. The MediaStreamTransceiver is constructed with a websocket URL to the media relay server (see next section) and has two stream properties - one for the stream you want to transmit (localStream) and one for the incoming stream (remoteStream). Once the connected event is fired you can play the remote stream in, e.g., a video element.
At an earlier planning stage the MediaStreamTransceiver used a separately constructed websocket as transport. However, that approach was abandoned in favor of internal websockets to protect the channels and avoid having to multiplex media data and data posted using an external websocket handle. Another benefit of hiding the transport details is that the developer does not have to worry about how many websocket channels is needed to transport a stream. In our current implementation we use one websocket channel per stream component.
Media Relay
For this demo we use a simple websocket server as a media relay. It works by connecting clients, using the same session identifier (sid), to each other and forwarding media data. The websocket-protocol header field is used to identify the content of a websocket channel, i.e. the media component type, to connect it to the corresponding channel on the recipient side.
Demo - A Simple Video Chat
The HTML code below is based on the self-view example from the previous blog post, but has been extended to support video chat.
<html> <head> <title>Video Chat</title> <script type="text/javascript" src="device_dialog.js"></script> <script type="text/javascript" src="wow_feature.js"></script> <script type="text/javascript"> window.onload = function () { var transceiver = new MediaStreamTransceiver("ws://150.132.141.60:8880/delayswitch?sid=0"); var videoDevice = document.getElementsByTagName("device")[0]; videoDevice.onchange = function (evt) { var videoStream = videoDevice.data; var selfView = document.getElementById("self_view"); // exclude audio from the self view selfView.src = videoStream.url + "#video"; selfView.play(); // set the stream to share transceiver.localStream = videoStream; }; transceiver.onconnect = function () { var remoteVideo = document.getElementById("remote_video"); // play the incoming stream remoteVideo.src = transceiver.remoteStream.url; remoteVideo.play(); }; } </script> </head> <body> <div><device type="media"></div> <div style="float:left"> <p>Self-view:</p> <video width="320" height="240" id="self_view"></video> </div> <div style="float:left"> <p>Remote video:</p> <video width="320" height="240" id="remote_video"></video> </div> </body> </html>
The video chat web application is basically the self view example with two additions. First, a MediaStreamTransceiver instance to share your local stream with a remote party, and to receive a remote media stream. Second, the addition of a second video element to play the remote media stream. The video below shows the demo web application in action.
Summary and Conclusions
We have demonstrated conversational voice and video in WebKit GTK+. To accomplish this, we have made the following modifications:
- Implemented the device element and the Stream API (device element GUI is currently written in JavaScript/CSS)
- Added MediaStreamManager to map Stream URLs to the corresponding pipeline in the media backend
- Added MediaStreamTransceiver to control the related media processing and transport
- Added support for binary data in the WebSocket protocol
That is it! No modifications has been made to the media framework or the underlying OS. We have tested our implementation on a couple of different device types and Linux distributions, and the performance (while not assessed in any formal test) seems quite OK. We are well aware of the drawbacks of using TCP transport for real-time media streams - perhaps RTP/UDP transport should be considered.
Our conclusion is that it would be feasible to support conversational voice and video in a web browser/runtime. Note again that the example code above is run in our patched version of WebKit GTK+ and it can not be expected to work in a regular web browser.
- Adam Bergkvist, Nicklas Sandgren, Stefan Håkansson, Jonas Lundberg and Per-Erik Brodin



Comments
Now is there any chance that the patched version of WebKit could end up in my inbox for some internal testing? I have an idea I would like to play with and it needs the device tag....
Well here is hoping....;)
Not sure if you have seen this issue added to Chromium
http://code.google.com/p/chromium/issues/detail?id=55377
We are planning to contribute our implementation to WebKit as soon as possible, in fact it is the next thing on our agenda.
If you are interested in more details, send me an email at stefan.alund _at_ ericsson _dot_ com
Stefan Ålund,
Project Manager, Ericsson Research
Good afternoon. if you can get the assembly webkit, demonstrated in the example with video chat. Also, if there is a possibility I would like to receive information about the server WebSocket communication for example. And wow_feature.js, device_dialog.js files.
Hi Adam and Team,
Would it be possible to obtain an early release of the WebKit Toolkit you're using to put together this demo? I would love to be able to replicate and run this example on my end.
Please feel free to email me at my registered account or at mort253 at msn.com.
Thank you for putting together this demo!
James Mortensen
Hi Team,
this is fantastic! I can't wait to have your contribution to WebKit. I'm been trying to capture my webcam using html5 but without success.
As James said before, is there a way to get the WebKit Toolkit you're using? I'd like to play with it a little bit without using flash.
my email is marcos.rimoldi _at_ gmail _dot_ com
Thanks a lot for your work,
Marcos.
Thanks for all the interest! We will unfortunately not release any previews of the code at this stage. Keep an eye on Labs blog for more details.
Stefan
Hi kuznetsovas
The WebSocket server is written in Python and works, as described in the post above, by forwarding media data between pairs of clients. It's for demo purposes and shouldn't really be used in a real life situation.
wow_feature.js contains a few lines of code to do the wobbly windows CSS transformations which are shown in the end of the video (1:55). It's triggered by clicking one of the videos.
device_dialog.js contains the implementation of the device element GUI. To have this implementation in HTML, CSS and JavaScript is a temporary solution that makes it easy to test various looks. It’s not specified how it should look yet. The items in the device dialog selector lists are populated by calling a custom function on the device element that queries the media backend for available devices. Another custom function is used to signal the user selections back down to the media backend. The previous blog post in our Beyond HTML5 series describes our implementation of the device element in more detail.
Hi team,
Can you send me the source code of the python server? Iwas looking for a server done in python to learn more about that. Please, send it to ja0212_at_gmail_dot_com.
Thaks
I wonder about your choice of URL fragments. Did you consider using the ones specified at http://www.w3.org/TR/2010/WD-media-frags-20100413/ which are explicit for media resources? It would end up being #track=video or #track=audio here.
Hi Silvia
We were not aware that the Media Fragments URI draft specified how to select tracks. We should align with that to avoid fragmentation.
Good afternoon. if you can get the assembly webkit, demonstrated in the example with video chat. Also, if there is a possibility I would like to receive information about the server WebSocket communication for example. And wow_feature.js, device_dialog.js files.
Hi Adam,
I'm working in a thesis to get my degree as System Engineer that is related to HTML5, I would like to show (with the expected acknowledge) what you have done. Could that be possible?
Thanks,
Alan
Hi Alan
Feel free to use and show the resources that we published here on labs. However, it's currently not possible to give you the code or a binary so that you can run the examples on your own.
Good luck with your thesis.
Hello,
I'm curious what codec did you use for audio and for video.
Can you choose what codecs to use for encoding ?
Thank you.
Hi gigitek
We used motion JPEG for video and Speex for audio in this demo. Simply because they were available in GStreamer and worked fairly well for what we wanted to achieve. We can easily replace these codecs with something else provided by GStreamer.
Hi,
This is something that I am eagerly awaiting. Its time that someone removes ourselves from the shackles of adobe and flash!
I have been looking into Adobe FMS 4 using RTMFP and this open source project Cumulus: https://github.com/OpenRTMFP/Cumulus
However I gather the problem with RTMFP is that it uses UDP and unless routers are configured for it, then it will fail. I have seen elsewhere that even with flash apps that have RTMFP P2P functionality, they still fallback to RTMP and connect to a server. Which in my mind, for a P2P only solution, adobe's effort is useless.
I mention the above because you say this:
"We are well aware of the drawbacks of using TCP transport for real-time media streams - perhaps RTP/UDP transport should be considered."
My own goal would be for no server to be involved, well only to manage the connection between user A and user B. No actual data for the stream should be passed through to the server. This would be a huge boon for someone like myself as then the costs for a P2P AV chat is much reduced. Servers such as Adobe FMS, Wowza, Red5 or C++ RTMP Server would not need to be used.
So my question is, can this goal be realised? Would a user only need internet ports (port 80) available and a webcam with webkit browser supporting the components for your code, for this to work?
Thanks
Paul
Have you checked out our latest blogpost about peer-to-peer conversational video in a browser (https://labs.ericsson.com/developer-community/blog/beyond-html5-peer-pee...)? It uses ICE to find a path for media (RTP) directly between the clients, much like you describe in your preferred scenario.
Peer-to-peer streaming requires more than the common Internet ports, but I believe that there will be a need for some kind of http fall-back required to get this working everywhere (e.g. corporate networks with proxys).
BR
Adam
I'm taking a look at the documentation webRTC, but wanted to know how to capture the user's camera without permission, help me!
That is not possible, nor should it be. It is easy to imagine what kind of malicious site could be built unless the user is in control of when he/she is recorded.
Stefan
Hi,
I tried the code put as an example. It was not able to detect connected webcam. Also I could not find two js files device_dilog.js and wow_features.js.
Can anyone tell me how can I connect to the webcam and capture images in HTML5 and using ubuntu 10.04 as development environment?
Hi,
This post is over a year old. Newer versions of the sample code can be found at the bottom of this page: https://labs.ericsson.com/apis/web-real-time-communication/downloads
A more detailed post on how to get the sample code running can be found here: https://labs.ericsson.com/developer-community/blog/web-rtc-tutorial
Let us know if you still have problems.
This blog post i really old. Take a look at https://labs.ericsson.com/apis/web-real-time-communication/ to download and try our modified browser library.