Understanding WebSocket handshake

  • Sharebar

In this blog I will discuss in detail the process of handshake in WebSocket Protocol. The first  step in WebSocket communication is the formal handshake between the client and the server. The discussion is based on the latest specification released on 16th August 2010 by I. Hickson. I will deal in detail the request and response headers and also discuss the why aspect.

If reader is more interested in getting started with WebSocket you may read the blog - Web Socket Communication Using  jWebSocket.

Introduction to Web Socket

The ability to push data from server to client has been a long standing requirement for developing certain web based applications. HTTP protocol requires the client to send a request in order  to receive a response from server. To overcome this short coming few workarounds like polling, long polling and comet based solutions are commonly used. WebSocket protocol addresses this issue and provides a standard means of seamless communication between the client and server. Asynchronous requests can be sent by both client and server to each other. Clients have callback methods and servers can have listeners to listen to events from each other. This mode of communication is commonly referred to as full duplex communication and since WebSocket is based over TCP it uses the underlying TCP socket for communication. The additional advantage of using WebSocket protocol is its very high scalability due reduced network traffic and latency.

Web Socket specification along with Web Socket API forms the complete specification for using Web Sockets.

Why handshake?

Web Socket communication is based upon a dedicated connection between the client and server, using which both  parties can send data to each other. But before the connection gets established between the client and the server it is mandatory requirement that both the parties should agree to do so. This process is known as hand shake.  Therefore the first step in Web Socket communication is formal handshake between the client and the server which can be initiated by client alone. Browser (client) initiates the handshake by sending a HTTP GET request with an offer to upgrade to Web Socket protocol.

Delving deep into the handshake

Now I will start dealing with the various aspects involved in Web Socket handshake and then show how one or more of the above mentioned headers are used to facilitate the handshake.

Request Headers

Based on latest Web Socket specification the typical request for a handshake should look like this

Request Headers

  • Connection and Upgrade - The browser (client) initiates the handshake by sending a HTTP GET request. But the connection is to be established using WebSocket protocol. Therefore  in the client request there should be some information which will tell the server that the client wants to upgrade the connection from HTTP protocol to  WebSocket protocol. To convey this information client includes two headers
    • Connection : Upgrade                   - Tells the server that the connection is an upgrade type of connection.
    • Upgrade:WebSocket                      - Tells the server the protocol to which the connection is to be upgraded.
    1. The request uses GET method.
    2. The request header has a Connection header with value Upgrade and a Upgrade header with value WebSocket
  • A brief context of Upgrade Header

    To ease the deployment of incompatible future protocols  HTTP1.1 included a new request header 'Upgrade'. By sending the Upgrade header, a client can inform a server of the set of protocols it supports as an alternate means of communication. The server may choose to switch protocols, but this is not mandatory.

    Similarly on the server side there will be checks to ensure that

    Only if the above two conditions are satisfied server will further process the other headers and finally establish the connection. The server side code can be somewhat like this

    if (aReq.getMethod() != HttpMethod.GET) {
    sendHttpResponse(aCtx, aReq, new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.FORBIDDEN));
    return;
    }
    // Serve the WebSocket handshake request.
    if (HttpHeaders.Values.UPGRADE.equalsIgnoreCase(aReq.getHeader(HttpHeaders.Names.CONNECTION))
    && HttpHeaders.Values.WEBSOCKET.equalsIgnoreCase(aReq.getHeader(HttpHeaders.Names.UPGRADE))) {
    
    //start processing the upgrade request.
    
    }
    
  • Host - It is important to ensure that the protocol is protected against threats of DNS rebinding and it should be possible to serve multiple domains from one IP address. Keeping these requirements in mind the WebSocket protocol specification requires that the handshake request should contain the Host header. Hackers use DNS rebinding to read data from private network. To know more about DNS rebinding watch this video.
  • Origin - The server should be able to make an informed decision about whether it wishes to accept a handshake invitation from a client. The Origin request header enables the client to send information about the origin of each request to the server. Servers can use this information to decide if each request is valid or not. To know more about Origin header click here.
  • Sec-WebSocket-Protocol - It is possible that the applications develop a few application level protocol layered over the WebSocket protocol. Such protocols are considered  as sub-protocols. The client should explicitly mention the sub-protocol /s which it can use to send/receive data to/from server. Sec-WebSocket-Protocol header is the sub protocol selector. In this header the client can send space separated list of strings. These string represent the sub-protocol the (the application-level protocol  layered over the WebSocket protocol) that the client can use. For example a client may be capable of handling sub protocols which send data packets in JSON, CSV or XML format. In that case the sub protocol details can be added in this header.
  • Sec-WebSocket-Key1 and Sec-WebSocket-Key2 - Since the handshake request is a plain HTTP request, there should be some way for server to identify that  it received a valid Web Socket handshake request.  This is important to ensure that the server accepts connection only from valid Web Socket requests. The security against cross protocol hacking is ensured by sending the headers Sec-WebSocket-Key1 and Sec-WebSocket-Key2 from the client. We will see in detail how these keys help ensure a cross protocol security when we discuss the response headers.
  • Cookies - Like Sec-WebSocket-Protocol, Cookies is also an optional header which the client can use to send cookies to the server.
  • Last 8 bytes - At the end of the handshake request there are 8 bytes of data. Sec-WebSocket-Key1 and Sec-WebSocket-Key2 and 8 byte data are used by the server to generate the challenge response. I will talk in detail about the challenge response when I will discuss the response headers.

Response Headers

On receiving the request for handshake, server starts processing the request and creates a response. Based on latest Web Socket specification the typical response for a handshake should look like this.

reponseheaders

  • HTTP-Status-Line - The  first line is an HTTP Status-Line, with the status code 101 - HTTP/1.1 101 WebSocket Protocol Handshake (the HTTP version and reason code are not important).
  • Connection and Upgrade - For HTTP compatibility reasons the server response also includes the Upgrade and the Connection header.
  • Sec-WebSocket-Location - Server also returns the host information to the client through the Sec-WebSocket-Location header. This helps both client and server agree on which host is in use for interaction.
  • Sec-WebSocket-Origin - Server clearly specifies with which origin/origins it agrees to create a WebSocket connection. This is very important from security perspective.
  • Sec-WebSocket-Protocol- The server also responds back with the sub-protocol it would be using to interact with the client. This response completes the cycle and establishes the sub-protocol which would be used for this connection.
  • Challenge response -Server needs to assure the client that the handshake request sent by the client was received by a valid WebSocket server. That it could understand the request and agrees to interact with the client using WebSocket protocol. To ensure this the client had sent to the server three pieces of information Sec-WebSocket-Key1, Sec-WebSocket-Key2 and the 8 bit data at the end of the request. The server is supposed to process this data in a particular way specified in the specification and then create a challenge response and put it in the response. This is the answer to the challenge sent by the client. On receiving the response handshake  from the server the client checks the response code, other headers and also the challenge response. Along with other things only if the challenge response matches the result expected by the client the connection is established, otherwise the connection is closed.

All these detailed handshake rules have been laid out to ensure the following

  • Client and server can make a informed decision to allow each other to create a WebSocket connection between themselves. The Host, Sec-WebSocket-Location and the Origin headers help ensure this.
  • No imposter should be able to break the cross protocol security. Which means a WebSocket server should be able to make a clear distinction between a carefully crafted form submit, or a XmlHttpRequest pretending to be a WebSocket handshake request and a real WebSocket handshake. Quoting from the specification

    This is primarily achieved by requiring that the server prove that it read the handshake, which it can only do if the handshake contains the appropriate parts which themselves can only be sent by a
    WebSocket handshake; in particular, fields starting with |Sec-| cannot be set by an attacker from a Web browser, even when using |XMLHttpRequest|.

    Similarly the client also checks that the handshake response received from the server has a correct challenge response, which is based on the keys sent by the client to the server. This ensures that no server side code can act to be pretending as the Web Socket server.

In my next blog I will discuss how data can be exchanged post handshake (frame of data, frame type  and sub protocols).

Resources

Related articles

Author Shankar,
Organization URL:
http://xebiaindia.com/.
I work as Senior Consultant
at Xebia India.
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
jQuery.fn.extend({
everyTime: function(interval, label, fn, times, belay) {
return this.each(function() {
jQuery.timer.add(this, interval, label, fn, times, belay);
});
},
oneTime: function(interval, label, fn) {
return this.each(function() {
jQuery.timer.add(this, interval, label, fn, 1);
});
},
stopTime: function(label, fn) {
return this.each(function() {
jQuery.timer.remove(this, label, fn);
});
}
});jQuery.extend({
timer: {
guid: 1,
global: {},
regex: /^([0-9]+)\s*(.*s)?$/,
powers: {
// Yeah this is major overkill...
'ms': 1,
'cs': 10,
'ds': 100,
's': 1000,
'das': 10000,
'hs': 100000,
'ks': 1000000
},
timeParse: function(value) {
if (value == undefined || value == null)
return null;
var result = this.regex.exec(jQuery.trim(value.toString()));
if (result[2]) {
var num = parseInt(result[1], 10);
var mult = this.powers[result[2]] || 1;
return num * mult;
} else {
return value;
}
},
add: function(element, interval, label, fn, times, belay) {
var counter = 0;if (jQuery.isFunction(label)) {
if (!times)
times = fn;
fn = label;
label = interval;
}interval = jQuery.timer.timeParse(interval);if (typeof interval != 'number' || isNaN(interval) || interval <= 0)
return;if (times && times.constructor != Number) {
belay = !!times;
times = 0;
}times = times || 0;
belay = belay || false

One Response to “Understanding WebSocket handshake”

  1. [...] Understanding WebSocket handshake [...]

Leave a Reply

You must be logged in to post a comment.