Adapting without assumptions

There have been a lot of talk recently about the Network Info API.

Paul Kinlan published an article about using Service Worker along with the Network Info API to send network information up to the server and let the server adapt its responses to these network info bits. There is also an intent to implement the downlinkMax attribute in the Blink rendering engine.

Since I have Opinions™ on the matter, and Twitter and mailing lists aren’t always the ideal medium, I wrote them down here.

This is a lengthy post, so if you don’t have time to read through it all, its claims are:

  • Current NetInfo API is not providing useful info.
  • We should improve current API (proposal).
  • We should improve our overall capabilities to adapt content based on user conditions, beyond just network conditions.

Current NetInfo API doesn’t expose what devs need

The current API is built around the following attributes:

  • type - indicates the “type” of network, with rather coarse granularity. e.g. “cellular” vs. “wifi”.
  • downlinkMax - indicates the maximum downlink speed of the underlying first-hop technology or an estimate of it. It has finer granularity, but has a certain duality to it, where the developer is not sure if they are getting a value based on a set of predefined values or a bandwidth estimate which is more likely to be related to reality.
  • onchange - An event handler that indicates that the network has changed, so that the app can somehow change behavior as a result.

The problem with the above is that it rarely provides Web developers with useful and actionable data without them having to make huge (and often false) assumptions about what that info means for the things they actually care about (and which are, for most cases, not the things this API exposes).

If you take a closer look at the downlinkMax table you can see that the info you get from it is dubious at best. If your user is on an Edge network, you would be led to think that their available download speed is 384 kbps. While they most probably don’t have that much bandwidth at their disposal, you can use that in order to figure out that they are on a not-so-great network, and change the resources you serve them accordingly.

But, what if they are WiFi-tethering over their 2G phone? In that case, you’d be led to think that the connection type is “WiFi” and the speed is capped at 11 Mbps. Not too shabby.

Except that the user would be experiencing even worse network conditions in the latter case than in the former one, without the developer knowing anything about it.

There are many other cases where looking at downlinkMax will lead you to the wrong conclusions. For example, take the case where your users are on an extremely lossy WiFi network (AKA: “hotel/conference WiFi”) where their effective bandwidth is very low. Or the case where they are on an HSDPA network which in theory can reach 14.3Mbps, but in reality, they are sharing a cell with thousands of other users, all trying to download cat-based entertainment, since they are all waiting for the bus/train/plane, which means the cell’s bandwidth is thinly divided between all those users, and the cell’s backhaul network (which is fetching those cats from the landline internet) is saturated as well.

In fact, the only case where downlinkMax is useful is in the “user is on an Edge network” case. For everything else, you’re out of luck: bad or tethered WiFi, 3G with poor coverage, poor backhaul, etc. will all present themselves as pretty good networks. That means that we could effectively replace downlinkMax with an isUserOnEdge boolean.

Even if we look at possible downlinkMax improvements using a bandwidth estimate of some sort, according to the current spec:

  • That estimate would be of the first hop, which means it cannot take into account backhaul congestion, tethering and other similar scenarios.
  • There’s no way for developers to distinguish between a first-hop bandwidth estimate, and a theoretical maximum bandwidth which will never be reached.

All of which leads me to believe that downlinkMax is not providing the information that developers actually need, and makes me worry that the info will be abused by developers (due to lack of better network information) if we would expose it.

So, what do developers need?

The general use-case that developers are trying to tackle here is that of content adaptation to the user’s condition. I’d claim that the main use-case would be to serve rich content to devices that can handle it right now, while providing decent and fast experience to devices that can’t handle the rich content, due to some constraints.

Some of the specific use-cases I heard people mention are:

  • Download smaller/other resources when network conditions are bad.
    • That is the use-case most often cited. While the “smaller resource” parts of that can be partly resolved with srcset and progressive video loading, that often means serving physically smaller resources, where what the developer actually wants is just applying harsher compression, at the expense of quality, but that would still be better than serving smaller resources and upscaling them. There can also be cases where we would want to serve different content based on network conditions. (e.g. replace video ads with static ads)
  • Download smaller/other resources when a low-end device can’t handle the load.
    • Low-end devices with very little memory and processing power can’t always handle the load of rendering the full Web sites with all its images, videos and scripts. In some cases developers need to detect that and send a simplified version.
    • See Tim Kadlec’s excellent “Reaching everyone, fast” talk for more details on that use-case.
  • Avoid syncing/downloading large chunks of data.
    • Some Web apps need to sync or download a lot of data, which may be costly, battery draining or clog the device’s storage, depending on the user’s conditions and their device. Developers need a way to know when the user is in conditions where they are likely to get pissed at them for starting such a costly operation.
  • Warn users before heavy downloads
    • Related to the last use-case, having a standard way to let users know that a large download is about to take place and allowing them to avoid it, would enable the browser to handle that “permission” and may be used to avoid bugging the user about that in the future.

Now, if we take these use-cases into consideration, what are the constraints that we need to expose to developers that would enable them to successfully tackle these use cases?

I think the list would include:

  • Actual network conditions
  • User preference - Does the user prefer fast delivery over heavy but fancy one?
  • Device capabilities - Can the device handle the resources I’m sending its way, or will it crash a burn on them?
  • Battery - If battery is scarce, maybe the user doesn’t need that fancy animation, and they just want the address to get where they want to?
  • Monetary cost of traffic (and if the user considers that cost expensive)

Let’s dive into each one of those.

Network conditions

The current NetInfo API talks about exposing network information, basically mimicking the Android APIs that can give an app developer the same info. So, as we’ve seen, this info gives the developer the rough network type and the theoretical bandwidth limits of the network the user is on.

But as a developer, I don’t much care about which first-hop radio technology is used, nor what is its theoretical limit. What I want to know is “Is the end-to-end network fast enough to deliver all those rich (read: heavy) resources in time for them to provide a pleasant user experience rather than a burdensome one?”

So, we don’t need to expose information about the network, as much as we need to expose the product of the overall end-to-end network conditions.

What developers need to know is the network conditions that the user is experiencing, and in most cases, what is their effective bandwidth.

While that’s hard to deliver (and I once wrote why measuring bandwidth is hard), the good folks of Google Chrome net stack are working to prove that hard != impossible. So, it looks like having an in-the-browser end-to-end network estimation is no longer a pipe dream.

Now, once we’ve estimated the network conditions, should we expose the raw values?

I believe we shouldn’t, at least not as a high-level “your bandwidth is X” single number.

The raw network information of incoming effective bandwidth and round-trip-times can be overwhelming, and the potential for misuse is too high. It’s also very likely to change rapidly, causing non-deterministic code behavior if exposed through script, and huge variance if exposed through Client-Hints.

What I believe we need to expose is a set of actionable, discrete values, and browsers would “translate” the stream of raw network data into one of those values. That would also enable browsers to start with rough bandwidth estimations, and iterate on them, making sure they’re more accurate over time.

As far as the values themselves, I propose something like unusable, bad, decent, good and excellent, because naming is hard.

Having discrete and imprecise values also has the advantage of enabling browsers to evolve what these values mean over time, since today’s “decent” may very well be tomorrow’s “bad”. We already have a Web platform precedent for similar discrete values as part of the update-frequency Media Query.

As a bonus, imprecise values would significantly decrease the privacy concerns that exposing the raw bandwidth would raise.

User preferences

We already have a proposal for this one. It’s called the Save-Data header that is part of the Client-Hints specification. It might be a good idea to also expose that to JavaScript.

The main question that remains here is how do we get the user’s preferences. As far as I understand, the idea in Chrome is to take advantage of a user’s opt-in to their compression proxy as an indication that they are interested in data savings in general.

That’s probably a good start, but we can evolve that to be so much smarter over time, depending on many other factors that the browser has about the user. (e.g. geography, data saving settings at the OS level, etc.)

Device capabilities

The current state of the art at detecting old and busted devices and avoiding sending them resources that they would choke on (due to constrained CPU and memory) is dubbed “cutting the mustard”. While a good effort to make due with what we have today, it is (again) making a bunch of potentially false assumptions.

The “cutting the mustard” method means detecting the presence of modern APIs and concluding from their absence that the device in question is old and busted. While their absence can indicate that, their presence doesn’t mean that the device is full-powered high-end smartphone. There are many low-end devices out there today with shiny new FirefoxOS installations. Any Android 4 phone may have an always-up-to-date Chrome, regardless of its memory and CPU (which can be extremely low).

Bottom line is: we cannot assume the state of the user’s hardware from the state of their software.

On the other hand, exposing all the different metrics that determine the device’s capacity is tricky. Do we expose raw CPU cycles? Raw memory? What should happen when CPU or memory are busy with a different app?

The solution to that is not very different from the one for network conditions. We can expose a set of discrete and actionable values, that can evolve over time.

The browsers can estimate the state of current hardware and current available processing power and memory, and “translate” that into a “rank” which would give developers an idea of what they are dealing with, and allow them to adapt their sites accordingly.

Lacking better names, the values could be minimal, low, mid and high.

Battery state

That’s easy, we already have that! The Battery status API is a candidate recommendation specification, and is fully supported in Chrome/Opera and partially supported in Firefox. All that’s left is to hope that support to other modern browsers would arrive soon.

Monetary cost

That part is tricky since browsers don’t actually have info regarding the data costs, and in many cases (such as tethered WiFi) our assumptions about the cost implications of network type are wrong.

I think that the only way out of this puzzle is asking the user. Browsers need to expose an interface asking the user for their preference regarding cost (e.g. enable them to mark certain WiFi networks as expensive, mark roaming as expensive, etc.).

Another option is to expose a way for developers to ask the user’s permission to perform large downloads (e.g. message synchronization, video download, etc.), and the browser can remember that preference for the current network, across multiple sites.

What we definitely shouldn’t do is tell developers that they should deduce cost from the network type being WiFi. Even if this is a pattern often used in the native apps world, it is blatantly wrong and is ignoring tethering as well as the fact that many cellular plans have unlimited data. (which brings back memories of me trying to sync music albums over unlimited 4G before going on a 12 hour flight, and the native app in question telling me “we’ll sync as soon as you’re on WiFi”. Argh!)

Why not progressive enhancement?

Why do we need to expose all that info at all? Why can’t we just build our Web sites to progressively enhance, so that the content downloads progressively, and the users get the basic content before all the fancy stuff downloads, so if their network conditions are bad, they just get the basic parts.

Well, progressive enhancement is great for many things, but cannot support some cases of content adaptation without adding extra delays.

  • The use-case of adapting resource byte-size to network conditions cannot be fully addressed with progressive enhancement, since it gives us no control over the compression quality of the resources we’re serving our users. While dimensions can be controlled through srcset and progressive video loading, they can often be crude instruments for that purpose, since upscaling smaller resolution resources would often have worse quality than a heavily compressed resource.
  • There are cases in which developers would want to tailor the site to the network conditions, e.g. sending a single, decent quality image instead of multiple low resolution images or to replace video ads with static ads.
  • Progressive enhancement can’t take into account the user’s monetary cost of the network or the user’s preference, and will continue to download the “fancy” bits even if the user prefers they won’t be downloaded.
  • Progressive enhancement can’t “go easy” on devices that would download all the site’s images, scripts and fonts only to later on choke on them, due to lack of memory and CPU. In order to properly support low-end devices as well as high-end ones without adding unnecessary delays to the common case, developers need an indication of device capabilities (ideally as a Client-Hint on the navigational request) in order to serve a simplified version of the site to devices that can’t handle more than that.

What happens when multiple paths are used?

As pointed out by Ryan Sleevi of Chrome networking fame, multi-path would break any attempts to expose either the available or theoretical network bandwidth. That is absolutely true, and yet another reason why we don’t want to expose the raw bandwidth, but a discrete and abstract value instead. The browser can then expose the overall effective bandwidth it sees (aggregated from all network connections), even in a multipath world.

How do we prevent it being the new User Agent string?

Another concern that was raised is that exposing network information would result in worse user experience (due the developer abuse of the exposed data), and would therefore result in browsers lying about the actual conditions the user is in.

In my view, the doom of the User-Agent string as an API was that it requires developers to make assumptions about what that string means regarding other things that actually matter to them (e.g. feature support).

While I agree with those concerns regarding downlinkMax and type, I believe that as long as we keep the assumptions that developers have to make to a minimum, there’s no reason developers would abuse APIs and harm their user’s experience while doing so. That also means that there would be no reason for browsers to eventually lie, and provide false API values.

What about the extensible Web?

Doesn’t the concept of exposing a high-level value rather than the raw data stand at odds with the Extensible Web manifesto?

I don’t think it is, as long as we also strive to expose the raw data eventually. But exposing the full breadth of network info or device capabilities info is not trivial. It would most probably require an API based on the Performance Timeline and I suspect it would have some privacy gotchas, since exposing the user’s detailed network, CPU and memory usage patterns smells like something that would have interesting privacy implications.

So, we should totally try to expose a low-level API, but I don’t think we should hold exposing the high level info (which I suspect would satisfy most use-cases) until we have figured out how to safely do that.

To sum it up

I strongly believe that exposing network conditions as well as other factors about the user’s environment would provide a solid foundation for developers to better adapt the sites they serve to the user’s conditions. We need to be careful about what we expose though, and make sure that it will not result in assumptions, abuse and lies.

Thanks to Paul Kinlan, Tim Kadlec and Jake Archibald for reviewing and commenting on an early draft of this post.

Written by Yoav Weiss


blog comments powered by Disqus

Later article : By the people

Older article : Being Pushy