I came up with this title all by myself
The rest is stuff I got off the Internet

Responsive Image Container

It's been a year since I last wrote about it, but the dream of the "magical image format" that will solve world hunger and/or the responsive images problem (whichever one comes first) lives on.

A few weeks back I started wondering if such an image format can be used to solve both the art-direction and resolution-switching use-cases.

I had a few ideas on how this can be done, so I created a prototype to prove that it's feasible. This prototype is now available, ready to be tinkered with.

In this post I'll try to explain what this prototype does, what it cannot do, how it works, and its advantages and disadvantages over markup solutions. I'll also try to de-unicorn the responsive image format concept, and make it more tangible and less magical.

You've got something against markup solutions?

No, I don't! Honest! Some of my best friends are markup solutions.

I've been part of the RICG for a while now, prototyping, promoting and presenting markup solutions. Current markup solutions (picture and srcset) are great and can cover all the important use cases for responsive images, and if it was up to me, I'd vote for shipping both picture and srcset (in its resolution switching version) in all browsers tomorrow.

But the overall markup based solution has some flaws.

Here's some of the criticism I've been hearing for the last year or so when talking responsive images markup solutions.

Too verbose

Markup solution are by definition verbose, since they must enumerate all the various resources. When art-direction is involved, they must also state the breakpoints, which adds to that verbosity.

Mixing presentation and content

Art-direction markup solution needs to keep layout breakpoints in the markup. That mixes presentation and content, and means that layout changes will force markup changes.

There have been constructive discussions on how this can be resolved, by bringing back the MQ definitions into CSS, but it's not certain when any of this will be defined and implemented.

Define viewport based breakpoints

This one is heard often from developers. For performance reasons, markup based solutions are based on the viewport size, rather than on the image's dimensions. Since the images' layout dimensions are not yet known to the browser by the time it start fetching images, it cannot rely on them to decide which resource to fetch.

For developers, that means that some sort of "viewport=>dimensions" table needs to be created on the server-side/build-step or inside the developer's head in order to properly create images that are ideally sized for a certain viewport dimensions and layout.

While a build step can resolve that issue in many cases, it can get complicated in cases where a single components is used over multiple pages, with varying dimensions in each.

Result in excessive download in some cases

OK, this one is something I hear mostly in my head (and from other Web performance freaks on occasion).

From a performance perspective, any solution that's based on separate resources for different screen sizes/dimensions requires re-downloading of the entire image if the screen size or dimensions change to a higher resolution than before. Since it's highly possible that most of that image data is already in the browser's memory or cache, re-downloading everything from scratch makes me sad.

All of the above made me wonder (again) how wonderful life would be if we had a file format based solution, that can address these concerns.

Why would a file format do better?

  • The burden is put on the image encoder. The markup stays identical to what it is today. A single tag with a single resource.
  • Automated conversion of sites to such a responsive images solution may be easier, since the automation layer would just focus on the images themselves rather than the page's markup and layout.
  • Image layout changes (following viewport dimension changes) can be handled by downloading only the difference between current image and the higher resolution one, without re-downloading the data that the browser already has in its memory.
  • Web developers will not need to maintain multiple versions of each image resource, even though they would have to keep a non-responsive version of the image, for content negotiation purposes.

This is my attempt at a simpler, file format based solution that will let Web developers do much less grunt work, avoid downloading useless image data (even when conditions change), while keeping preloaders working.

Why not progressive JPEG?

Progressive JPEG can fill this role for the resolution switching case, but it's extremely rigid.

There are strict limits on the lowest image quality, and from what I've seen, it is often too data-heavy. The minimal difference between resolutions is also limited, and doesn't give enough control to encoders that want to do better.

Furthermore, progressive JPEG cannot do art-direction at all.

How would it look like?

A responsive image container, containing internal layers that can be either WebP, JPEG-XR, or any future format. It uses resizing and crop operations to cover both the resolution switching and the art direction use cases.

The decoder (e.g. the browser) will then be able to download just the number of layers it needs (and their bytes) in order to show a certain image. Each layer will provide enhancement on the layer before it, giving the decoder the data it needs to show it properly in a higher resolution.

How does it work?

  • The encoder takes the original image, along with a description of the required output resolutions and optionally art-direction directives.
  • It then outputs a layer per resolution that the final image should be perfectly rendered in.
  • Each layer represents the difference in image data between the previous layer, when "stretched" on the current layer's canvas, and the current layer's "original" image. That way, the decoder can construct the layers one by one, each time using the previous layer to recreate ...

How Big Is Art-Direction?

For a while now, the art-direction use-case have been treated by browser vendors as resolution-switching's imaginary friend.

When talking to people who work for browser vendors about that use-case, I've heard phrases like "that's a really obscure use-case" and "No one is really doing art-direction".

This got me wondering — how big is that use-case? How many Web developers & designers are willing to go the extra mile, optimize their images (from a UI perspective), and adapt them so that they'd be a perfect fit to the layout they're in?


With the lack of solid data on the subject, I had to go get some :)

Arguably, one of the easiest ways for Web developers to implement art-direction today is to use picturefill — the framework that polyfills the picture element's syntax. So all I had to do is find sites using picturefill and see which ones use the framework for art-direction rather than simple resolution-switching.

I've used the WebDevData scripts to get a hold of Alexa's top 50K websites' HTML. Then I grepped through those HTML files to find pages that contain "data-picture" (the data attribute used by picturefill), downloaded the images and (manually) went through the results to find which sites art-direct their images. Not very scalable, but it works for a small amount of sites.


The results showed that 24% (7 out of 29) of the sites that use picturefill, use it to implement art-direction. While a larger sample would be better, this is a strong indication that the art-direction use-case is an important use-case for responsive images.


Embedding the Gist with the results:

Fetching responsive image format

I just read Jason Grigsby's post, and tried to answer it in the comments, but saw that my response passed the limits of a reasonable comment. So here I am.

This post is a proposal for a file structure that will enable browsers to fetch images encoded using a responsive image format.

But which format?

Regardless of the image format that will eventually be used, a large part of the problem is coming up with a way to download only the required parts of the responsive image, without downloading unneeded image data and without reseting the TCP connection.

In any case, the format itself should be constructed in layers, where the first layer contains the image's lowest resolution, and each further layer adds more detail. An example of such layers are JPEG's progressive mode scans.

Earlier proposals

In a recent discussion, Jason linked me to a proposal for a responsive image format. While I didn't find the proposal practical because of its use of JPEG-XR, I did like the way it suggested to handle fetching of the different layers (for different resolutions). Actually, I liked it more than I liked my own proposal to use ranges.

The main disadvantage of this method is that it may cost up to a full round-trip time (RTT) per layer to fetch an image. If you have more then simple low/high resolution layer, the delay might quickly add up.

Responsive image file structure

  • The image will be split into two or more files
  • Each one of these files will have its own URL
  • The image's header and the first (lowest resolution) layer will be in a single file. This file's URL will be part of the HTML and will trigger fetching of the image.
  • Other files may contains one or more layers
  • If a file contains more than a single layer, the layers must be in ascending order, from lower resolution to higher one.
  • The first layer should contain meta data that includes the number of files, which layers each file contains and the byte offset of each layer inside each file.
  • The HTTP response headers of the first layer should contain a list of files to the followup layers.
  • Image loading process

    The browser will fetch the image's first layer file, as part of the page's loading process, using the lookahead pre-parser. That first layer will provide the browser with all the information it needs to further download more layers (which might be in one or more further files) as it sees fit. Fetching more layers will be based on the file structure. Files that only contain needed layers will be fetched in their entirety. For files that also contain unneeded layers, "Range" requests will be used.


    That file structure will give the author enough flexibility to arrange the image's layers in an optimal way. In case the author knows that its server and front-end cache support the HTTP "Range" header, he can use a single file to serve all the layers beyond the first layer. If this is not the case, the author can serve each layer in a file of its own.

    From the browser's perspective, this structure enables it to fetch additional layers as soon as it knows the dimensions of the image to be displayed. Additional layers can be fetched using "Range" (where supported) or using separate HTTP requests. In case that separate HTTP requests are used, the browser can also parallel them, since it has all the URLs for the layers it needs once it got the first layer. The requests for the different layers can also be pipelined in this case.

    By definition, the browser needs to wait for the layout phase in order to be absolutely sure it needs to download followup layers. If that would prove to be a performance bottleneck, the browser can heuristically download followup layers before it is certain they are needed (based on viewport size, image dimensions, etc).

    Another advantage is that for "non-responsive" images, the browser simply downloads the image itself. There's no need to declare in the markup if an image is responsive or not.


    When compared to simple image fetching, image fetching with the technique described above may suffer up to a single RTT delay, when "Range" is supported. If "Range" is not supported, the delay per image may go up, even though it is not likely that it will reach the maximal "RTT per layer" performance cost. This disadvantage is probably negligable compared to the time savings that will result from fewer bytes passing over the wire.

    On the other hand, for retina display devices that download all the image's layers, this delay may be noticeable.


Responsive image format

Can't be done?

All along the responsive images debate, there were several people that claimed that the salvation will come in the form of a new image format that will enable images that are automagically responsive.

My response to these claims was always that it can't be done.

It can't be done since the browser needs to download the image in order for it to analyze which parts of the image it needs. Yes, the browser can start to download the image and reset the connection once it has enough data to display the image properly, but that will always download much more than actually neccessary. (not to mention, an extremely ugly solution)

Also, introducing new image formats to the web is less than trivial and extremely slow at best (If you're not convinced, see Mozilla's response to WebP a year ago.)

And don't get me started on the lack of fallback mechanisms for new image formats :)

So, in one of the latest twitter discussions, when the subject came up, I was about to make all the above claims once again. But then I realized I was wrong all along. It can be done, it can be done gracefully, and it can be done with current image formats


The web already has a "responsive" format, which is progressive JPEG. The only issue at hand is getting the browsers to download only the neccesary bytes of the progressive JPEG.

Here's how we can do this:

  • The author will compress the progressive JPEG with multiple scans
  • The browser would download an initial buffer of each image (10-20K), using the "Range" request header
  • This initial buffer will contain the image's dimensions and (optionally) a "scan info" JPEG comment that will state the byte breakpoints of each one of the JPEG scans (slightly similar to the MP4 video format meta data)
  • If the image is not a progressive JPEG, the browser will download the rest of the image's byte range
  • When the scan info comment is present, the browser will download only the byte range that it actaully needs, as soon as it knows the image's presentation size.
  • When the scan info comment is not present, the browser can rely on dimension based heuristics and the "Content-Length" header to try and guess how many bytes it needs to really download.


  • DRY and easy to maintain - no need to sync the URLs with the correct resolution between the image storage and the HTML/CSS. Only a single image must be stored on the server, which will significantly simplify authors' lives.
  • The image optimization can be easily automated.
  • Any progressive image used in a responsive design (or that its display dimensions are smaller than its real dimensions) can benefit from this, even if the author is not aware of responsive images.


  • The optimization burden with this approach will lie on the shoulders of browser vendors. Browsers will have to come up with heuristics that correlate between number of bits per scan and the "visually acceptable" output dimensions.
  • Two request for every large image, might have a negative effect on the download speed & uplink bandwidth. Browser vendors will have to make sure it won't negatively effect speed. SPDY can resolve the uplink bandwidth concerns.
  • It is not certain that savings using the "responsive progressive" method are identical to savings possible using resize methods. If it proves to be an issue, it can probably be optimized in the encoder.


This proposal does not claim that all the current <picture> tag efforts are not neccessary. They are required to enable "art direction responsiveness" to images, and give authors that need it more control over the actual images delivered to users.

With that said, most authors might not want to be bothered with the markup changes required. A new, complementary image convention (not really a new format) that will provide most of the same benefits, and can be applied using automated tools can have a huge advantage.

It is also worth noting that I did not conduct a full byte size comparison research between the responsive progressive method and full image resizing. See the example below for an anecdotal comparison using a single large image.


All of the images in the responsive progressive example are a single progressive JPEG that was truncated after several scans.

This is an attempt to simulate what a single progressive JPEG might look like at various resolutions when only a part of its scans are used, and how much the browsers will have to download.

We can see here that the thumbnail image below is significantly larger as responsive progressive than it is as a resized thumbnail, and the largest image is about the same size.

IMO, the responsive progressive images look significantly better than their resized counterparts, so there's probably room for optimization here.

The original image is 1920x1280, weighs 217K and can be found here (It is part of Ubuntu's default wallpapers package)

240x160 - responsive progressive - 17K

240x160 - resize - 5.2K

480x320 - responsive progressive - 21K

480x320 - resize - 15K

960x640 - responsive progressive - 57K

960x640 - resize - 59K

Update: I just saw a slightly similar proposal here. My main problem with it is that a new format will take too long to implement and deploy, and will have no fallback for older browsers.

Simpler responsive images proposal


Adding a media attribute that supports queries to the base tag is all that's required to have responsive images with no performance penalty.

The thread

After my last post Nicolas Gallagher pointed me towards a mail thread on html-public mailing list that discusses appropriate solutions to the responsive images problem.*

There were a few suggested solutions there:

  • Each image tag will have child source tags with a media attribute in each
  • A new "image" format that will deliver the browser the real image URLs according to dimensions. The browser will then fetch the image it needs according to that info
  • Web authors will use a progressive image format and browsers will terminate the connection once they have enough data to properly reconstruct a downsized image
  • Allow media attribute on all elements
  • Add HTTP headers that will allow content negotiation

In my opinion, only the two solutions that involve the media attribute can resolve the problem with a front-end only solution, where content stays in the HTML (leaving the CSS cacheable independently from content) without any performance problems. The downside of both is that they add a lot of repeating code to the HTML. Each resource will have to be defined several times while adding a media query to each resource. A lot of copy-pasting...


That got me thinking of a "conditional comment"-like media query syntax inside the HTML that will enable to define a different base tag according to dimensions. Then I realized that we don't need the fancy new syntax I just made up. All we need is a media attribute in the base tag that supports queries.

A base tag with a media attribute will enable us to set the base for relative URLs according to dimensions, so we would be able to simply have small images in one directory and larger images in another one, without having to specify that on a per-image basis.

Also, adding media attribute only to the base tag will probably be simpler to implement than adding it to all resources.

While that solution won't provide maximal flexibility in determining the different resolution URLs, I believe it is good enough to resolve the responsive images problem in a clean, pure front-end manner.


* I actually saw the initial suggestions there a couple of months ago, but missed the followup responses

Responsive images - hacks won't cut it


Responsive images are important for mobile performance. Hacks may solve the problem, but they come with their own performance penalty. Browser vendors must step up to create a standard, supported method.

What we have

There have been several techniques published lately that enable responsive images using various hacks:

  • Harry Roberts suggested to use background images & media queries to deliver larger images to desktop browsers
  • Keith Clark suggested to use JS at the document head to plant cookies that will then send the device dimensions to the server with every image request. The server can then serve different image dimensions on the same image URL
  • Yet another approach is that of the filament group which is based on dynamically modifying the base tag according to device dimensions

Not good enough

The reason we need responsive images in the first place is to avoid downloading excessively large images and avoid the performance penalty that such excessive download incurs.

All current techniques avoid some of this performance penalty, but come with new performance issues. Using the same URL for images in different dimensions means that the images can't be cached by intermediary cache servers. Using only/mainly background images means that images download won't start until the CSS was downloaded and parsed. It also means that the CSS that contains content-related image URLs cannot be long term cacheable. Dynamically modifying the base tag is generally frowned upon by browser vendors since it can mess up the preload scanner, that loads external resources before all CSS and JS were downloaded and run.

All in all, since techniques that modify the URL prevent the preload scanner from working properly, and techniques that don't modify the URL prevent caching, I don't see how a responsive images hack can avoid a performance penalty of its own, which kinda misses the point.

What we need

We need browser vendors to step up and propose (i.e. implement ☺) a standard method to do this. A standard method that will be supported by the preload scanner, and therefore, won't delay image download and won't have a performance penalty.

I have made a proposal a couple of months ago for such a method, in response to Nicolas Gallagher's and Robert Nyman's proposals, but basically any method that will keep the URL maintenance burden in the HTML, keep both CSS & images cacheable, and will have no performance penalty will be welcome.



  • I'm Yoav Weiss.
  • I'm a developer.
  • I have a thing for web performance.
  • I live on a hill.
  • I have 3 small kids.
  • I don't sleep much.