Yoav's blog thing

Mastodon   RSS   Twitter   GitHub  

Responsive Image Container

It's been a year since I last wrote about it, but the dream of the "magical image format" that will solve world hunger and/or the responsive images problem (whichever one comes first) lives on.

A few weeks back I started wondering if such an image format can be used to solve both the art-direction and resolution-switching use-cases.

I had a few ideas on how this can be done, so I created a prototype to prove that it's feasible. This prototype is now available, ready to be tinkered with.

In this post I'll try to explain what this prototype does, what it cannot do, how it works, and its advantages and disadvantages over markup solutions. I'll also try to de-unicorn the responsive image format concept, and make it more tangible and less magical.

# You've got something against markup solutions?

No, I don't! Honest! Some of my best friends are markup solutions.

I've been part of the RICG for a while now, prototyping, promoting and presenting markup solutions. Current markup solutions (picture and srcset) are great and can cover all the important use cases for responsive images, and if it was up to me, I'd vote for shipping both picture and srcset (in its resolution switching version) in all browsers tomorrow.

But the overall markup based solution has some flaws.

Here's some of the criticism I've been hearing for the last year or so when talking responsive images markup solutions.

# Too verbose

Markup solution are by definition verbose, since they must enumerate all the various resources. When art-direction is involved, they must also state the breakpoints, which adds to that verbosity.

# Mixing presentation and content

Art-direction markup solution needs to keep layout breakpoints in the markup. That mixes presentation and content, and means that layout changes will force markup changes.

There have been constructive discussions on how this can be resolved, by bringing back the MQ definitions into CSS, but it's not certain when any of this will be defined and implemented.

# Define viewport based breakpoints

This one is heard often from developers. For performance reasons, markup based solutions are based on the viewport size, rather than on the image's dimensions. Since the images' layout dimensions are not yet known to the browser by the time it start fetching images, it cannot rely on them to decide which resource to fetch.

For developers, that means that some sort of "viewport=>dimensions" table needs to be created on the server-side/build-step or inside the developer's head in order to properly create images that are ideally sized for a certain viewport dimensions and layout.

While a build step can resolve that issue in many cases, it can get complicated in cases where a single components is used over multiple pages, with varying dimensions in each.

# Result in excessive download in some cases

OK, this one is something I hear mostly in my head (and from other Web performance freaks on occasion).

From a performance perspective, any solution that's based on separate resources for different screen sizes/dimensions requires re-downloading of the entire image if the screen size or dimensions change to a higher resolution than before. Since it's highly possible that most of that image data is already in the browser's memory or cache, re-downloading everything from scratch makes me sad.

All of the above made me wonder (again) how wonderful life would be if we had a file format based solution, that can address these concerns.

# Why would a file format do better?

This is my attempt at a simpler, file format based solution that will let Web developers do much less grunt work, avoid downloading useless image data (even when conditions change), while keeping preloaders working.

# Why not progressive JPEG?

Progressive JPEG can fill this role for the resolution switching case, but it's extremely rigid.

There are strict limits on the lowest image quality, and from what I've seen, it is often too data-heavy. The minimal difference between resolutions is also limited, and doesn't give enough control to encoders that want to do better.

Furthermore, progressive JPEG cannot do art-direction at all.

# How would it look like?

A responsive image container, containing internal layers that can be either WebP, JPEG-XR, or any future format. It uses resizing and crop operations to cover both the resolution switching and the art direction use cases.

The decoder (e.g. the browser) will then be able to download just the number of layers it needs (and their bytes) in order to show a certain image. Each layer will provide enhancement on the layer before it, giving the decoder the data it needs to show it properly in a higher resolution.

# How does it work?

Support for resolution switching is obvious in this case, but art-direction can also be supported by positioning the previous layer on the current one and being able to give it certain dimensions.

Let's look at some examples:

# Art-direction

Here's a photo that used often in discussion of the art-direction use-case (I've been too lazy to search for a new one):

Obama in a jeep factory - original withcontext

let's take a look at what the smallest layer would look like:

Obama in a jeep factory - cropped to show onlyObama

That's just a cropped version of the original - nothing special.

Now one layer above that:

Obama in a jeep factory - some context + diff from previouslayer

You can see that pixels that don't appear in the previous layer are shown normally, while pixels that do only contain the difference between them and the equivalent ones in the previous layer.

And the third, final layer:

Obama in a jeep factory - full context + diff from previouslayer

# Resolution switching

A high resolution photo of a fruit:

iPhone - originalresolution

The first layer - showing a significantly downsized version

iPhone - significantlydownsized

The second layer - A diff between a medium sized version and the "stretched" previous layer

iPhone - medium sizeddiff

And the third layer - containing a diff between the original and the "stretched" previous layer

iPhone - full sizeddiff

If you're interested in more details you can go to the repo. More details on the container's structure are also there.

# But I need more from art-direction

I've seen cases where rotation and image repositioning is required for art-direction cases. It was usually in order to add a logo/slogan at different locations around the image itself, depending on the viewport dimensions.

This use-case is probably better served by CSS. CSS transforms can handle rotation and CSS positioning, along with media specific background images, can probably handle the rest.

If your art-direction case is special, and can't be handled by either one of those, I'd love to hear about it.

# How will it be fetched?

That's where things get tricky. A special fetching mechanism must be created in order to fetch this type of images. I can't say that I have that part all figured out, but here's my rough idea on how it may work.

My proposed mechanism relies on HTTP ranges, similar to the fetching mechanisms of the <video> element, when seeks are involved.

More specifically:

The above mechanism will increase the number of HTTP requests, which in an HTTP/1.1 world will probably introduce some delay in many cases.

That mechanism can be optimized by defining a manifest that would describe the image resources' bytes ranges to the browser. The idea for adding a manifest was proposed by Cyril Concolato at last year's TPAC, and it makes a lot of sense, borrowing from our collective experience with video streaming. It can enable browsers to avoid fetching an arbitrary initial range (at least once the manifest was downloaded itself).

Adding a manifest will prevent these extra requests for everything requested after layout, and may help to prevent them (using heuristics) even before layout.

Creating a manifest can be easily delegated to either build tools or the server side layer, so devs don't have to manually deal with these image specific details.

# Can't we simply reset the connection?

In theory we can address this by fetching the entire image, and reset the connection once the browser has all the necessary data, but that will most likely introduce serious performance issues.

The problems with reseting a TCP connection during a browsing session are:

# Downsides of this approach?

# So, should we dump markup solutions?

Not at all. This is a prototype, showing how most of the responsive images use-cases would have been solved by such a container.

Reaching consensus on this solution, defining it in detail and implementing it in an interoperable way may be a long process. The performance implications on HTTP/1.1 sites and decoding speed still needs to be explored.

I believe this may be a way to simplify responsive images in the future, but I don't think we should wait for the ideal solution.

# To sum it up

If you just skipped here, that's OK. It's a long post.

Just to sum it up, I've demonstrated (along with a prototype) how a responsive image format can work, and can resolve most of the responsive images use cases. I also went into some detail about which other bits would have to be added to the platform in order to make it a viable solution.

I consider this to be a long term solution since some key issues need to be addressed before this solution can be practical.
IMO, the main issue is decoding performance, with download performance impact on HTTP/1.1 being a close second.

I think it's worth while to continue to explore this option, but not wait for it. Responsive images need an in-the-browser, real-life solution two years ago today, not two years from now.


← Home