Fetching responsive image format

I just read Jason Grigsby’s post, and tried to answer it in the comments, but saw that my response passed the limits of a reasonable comment. So here I am.

This post is a proposal for a file structure that will enable browsers to fetch images encoded using a responsive image format.

But which format?

Regardless of the image format that will eventually be used, a large part of the problem is coming up with a way to download only the required parts of the responsive image, without downloading unneeded image data and without reseting the TCP connection.

In any case, the format itself should be constructed in layers, where the first layer contains the image’s lowest resolution, and each further layer adds more detail. An example of such layers are JPEG’s progressive mode scans.

Earlier proposals

In a recent discussion, Jason linked me to a proposal for a responsive image format. While I didn’t find the proposal practical because of its use of JPEG-XR, I did like the way it suggested to handle fetching of the different layers (for different resolutions). Actually, I liked it more than I liked my own proposal to use ranges.

The main disadvantage of this method is that it may cost up to a full round-trip time (RTT) per layer to fetch an image. If you have more then simple low/high resolution layer, the delay might quickly add up.

Responsive image file structure

  • The image will be split into two or more files
  • Each one of these files will have its own URL
  • The image’s header and the first (lowest resolution) layer will be in a single file. This file’s URL will be part of the HTML and will trigger fetching of the image.
  • Other files may contains one or more layers
  • If a file contains more than a single layer, the layers must be in ascending order, from lower resolution to higher one.
  • The first layer should contain meta data that includes the number of files, which layers each file contains and the byte offset of each layer inside each file.
  • The HTTP response headers of the first layer should contain a list of files to the followup layers.
  • Image loading process

    The browser will fetch the image’s first layer file, as part of the page’s loading process, using the lookahead pre-parser. That first layer will provide the browser with all the information it needs to further download more layers (which might be in one or more further files) as it sees fit. Fetching more layers will be based on the file structure. Files that only contain needed layers will be fetched in their entirety. For files that also contain unneeded layers, “Range” requests will be used.


    That file structure will give the author enough flexibility to arrange the image’s layers in an optimal way. In case the author knows that its server and front-end cache support the HTTP “Range” header, he can use a single file to serve all the layers beyond the first layer. If this is not the case, the author can serve each layer in a file of its own.

    From the browser’s perspective, this structure enables it to fetch additional layers as soon as it knows the dimensions of the image to be displayed. Additional layers can be fetched using “Range” (where supported) or using separate HTTP requests. In case that separate HTTP requests are used, the browser can also parallel them, since it has all the URLs for the layers it needs once it got the first layer. The requests for the different layers can also be pipelined in this case.

    By definition, the browser needs to wait for the layout phase in order to be absolutely sure it needs to download followup layers. If that would prove to be a performance bottleneck, the browser can heuristically download followup layers before it is certain they are needed (based on viewport size, image dimensions, etc).

    Another advantage is that for “non-responsive” images, the browser simply downloads the image itself. There’s no need to declare in the markup if an image is responsive or not.


    When compared to simple image fetching, image fetching with the technique described above may suffer up to a single RTT delay, when “Range” is supported. If “Range” is not supported, the delay per image may go up, even though it is not likely that it will reach the maximal “RTT per layer” performance cost. This disadvantage is probably negligable compared to the time savings that will result from fewer bytes passing over the wire.

    On the other hand, for retina display devices that download all the image’s layers, this delay may be noticeable.


Written by Yoav Weiss


blog comments powered by Disqus