In this article, I’ll be taking a fresh look at PWAs. As well as exploring implications for both SEO and usability, I’ll be showcasing some modern frameworks and build tools which you may not have heard of, and suggesting ways in which we need to adapt if we’re to put ourselves at the technological forefront of the web.
1. Recap: PWAs, SPAs, and service workers
Progressive Web Apps are essentially websites which provide a user experience akin to that of a native app. Features like push notifications enable easy re-engagement with your audience, while users can add their favorite sites to their home screen without the complication of app stores. PWAs can continue to function offline or on low-quality networks, and they allow a top-level, full-screen experience on mobile devices which is closer to that offered by native iOS and Android apps.
Best of all, PWAs do this while retaining – and even enhancing – the fundamentally open and accessible nature of the web. As suggested by the name they are progressive and responsive, designed to function for every user regardless of their choice of browser or device. They can also be kept up-to-date automatically and — as we shall see — are discoverable and linkable like traditional websites. Finally, it’s not all or nothing: existing websites can deploy a limited subset of these technologies (using a simple service worker) and start reaping the benefits immediately.
The spec is still fairly young, and naturally, there are areas which need work, but that doesn’t stop them from being one of the biggest advancements in the capabilities of the web in a decade. Adoption of PWAs is growing rapidly, and organizations are discovering the myriad of real-world business goals they can impact.
You can read more about the features and requirements of PWAs over on Google Developers, but two of the key technologies which make PWAs possible are:
- Service Workers: A special script that your browser runs in the background, separate from your page. It essentially acts as a proxy, intercepting and handling network requests from your page programmatically.
Note that these technologies are not mutually exclusive; the single page app model (brought to maturity with AngularJS in 2010) obviously predates service workers and PWAs by some time. As we shall see, it’s also entirely possible to create a PWA which isn’t built as a single page app. For the purposes of this article, however, we’re going to be focusing on the ‘typical’ approach to developing modern PWAs, exploring the SEO implications — and opportunities — faced by teams that choose to join the rapidly-growing number of organizations that make use of the two technologies described above.
We’ll start with the app shell architecture and the rendering implications of the single page app model.
2. The app shell architecture
// Run this in your console to modify the URL in your // browser - note that the page doesn't actually reload. history.pushState(null, "Page 2", "/page2.html");
The bigger problem facing SEO today is actually much easier to understand: rendering content, namely when and how it gets done.
Note that when I refer to rendering here, I’m referring to the process of constructing the HTML. We’re focusing on how the actual content gets to the browser, not the process of drawing pixels to the screen.
In the early days of the web, things were simpler on this front. The server would typically return all the HTML that was necessary to render a page. Nowadays, however, many sites which utilize a single page app framework deliver only minimal HTML from the server and delegate the heavy lifting to the client (be that a user or a bot). Given the scale of the web this requires a lot of time and computational resource, and as Google made clear at its I/O conference in 2018, this poses a major problem for search engines:
On larger sites, this second wave of indexation can sometimes be delayed for several days. On top of this, you are likely to encounter a myriad of problems with crucial information like canonical tags and metadata being missed completely. I would highly recommend watching the video of Google’s excellent talk on this subject for a rundown of some of the challenges faced by modern search crawlers.
But server-side rendering is a concept which is frequently misunderstood…
“Implement server-side rendering”
This is a common SEO audit recommendation which I often hear thrown around as if it were a self-contained, easily-actioned solution. At best it’s an oversimplification of an enormous technical undertaking, and at worst it’s a misunderstanding of what’s possible/necessary/beneficial for the website in question. Server-side rendering is an outcome of many possible setups and can be achieved in many different ways; ultimately, though, we’re concerned with getting our server to return static HTML.
So, what are our options? Let’s break down the concept of server-side rendered content a little and explore our options. These are the high-level approaches which Google outlined at the aforementioned I/O conference:
- Dynamic Rendering — Here, normal browsers get the ‘standard’ web app which requires client-side rendering while bots (such as Googlebot and social media services) are served with static snapshots. This involves adding an additional step onto your server infrastructure, namely a service which fetches your web app, renders the content, then returns that static HTML to bots based on their user agent (i.e. UA sniffing). Historically this was done with a service like PhantomJS (now deprecated and no longer developed), while today Puppeteer (headless Chrome) can perform a similar function. The main advantage is that it can often be bolted into your existing infrastructure.
The latter is cleaner, doesn’t involve UA sniffing, and is Google’s long-term recommendation. It’s also worth clarifying that ‘hybrid rendering’ is not a single solution — it’s an outcome of many possible approaches to making static prerendered content available server-side. Let’s break down how a couple of ways such an outcome can be achieved.
So, what other options are there? If you can’t justify the time or expense of a full isomorphic setup, or if it’s simply overkill for what you’re trying to achieve, are there any other ways you can reap the benefits of the single page app model — and hybrid rendering setup — without sabotaging your SEO?
Having rendered content available server-side doesn’t necessarily mean that the rendering process itself needs to happen on the server. All we need is for rendered HTML to be there, ready to serve to the client; the rendering process itself can happen anywhere you like. With a JAMstack approach, rendering of your content into HTML happens as part of your build process.
A great example is GatsbyJS, which is built in React and GraphQL. I won’t go into too much detail, but I would encourage everyone who’s read this far to check out their homepage and excellent documentation. It’s a well-supported tool with a reasonable learning curve, an active community (a feature-packed v2.0 was released in September), an extensible plugin-based architecture, rich integrations with many CMSs, and it allows developers to utilize modern frameworks like React without sabotaging their SEO. There’s also Gridsome, based on VueJS, and React Static which — you guessed it — uses React.
Enterprise-level adoption of these platforms looks set to grow; GatsbyJS was used by Nike for their Just Do It campaign, Airbnb for their engineering site airbnb.io, and Braun have even used it to power a major e-commerce site. Finally, our friends at SEOmonitor used it to power their new website.
3. Service Workers
First of all, I should clarify that the two technologies we’re exploring — SPAs and service workers — are not mutually exclusive. Together they underpin what we commonly refer to as a Progressive Web App, yes, but it’s also possible to have a PWA which isn’t an SPA. You could also integrate a service worker into a traditional static website (i.e. one without any client-side rendered content), which is something I believe we’ll see happening a lot more in the near future. Finally, service workers operate in tandem with other technologies like the Web App Manifest, something that my colleague Maria recently explored in more detail in her excellent guide to PWAs and SEO.
Ultimately, though, it is service workers which make the most exciting features of PWAs possible. They’re one of the most significant changes to the web platform in its history, and everyone whose job involves building, maintaining, or auditing a website needs to be aware of this powerful new set of technologies. If, like me, you’ve been eagerly checking Jake Archibald’s Is Service Worker Readypage for the last couple of years and watching as adoption by browser vendors has grown, you’ll know that the time to start building with service workers is now.
We’re going to explore what they are, what they can do, how to implement them, and what the implications are for SEO.
What can service workers do?
- Intercepting network requests and deciding what to do with them programmatically. The worker might go to network as normal, or it might rely solely on the cache. It could even fabricate an entirely new response from a variety of sources. That includes constructing HTML.
- Handling push notifications, similar to a native app. This means websites can get permission from users to deliver notifications, then rely on the service worker to receive messages and execute them even when the browser is closed.
- Executing background sync, deferring network operations until connectivity has improved. This might be an ‘outbox’ for a webmail service or a photo upload facility. No more “request failed, please try again later” – the service worker will handle it for you at an appropriate time.
The benefits of these kinds of features go beyond the obvious usability perks. As well as driving adoption of HTTPS across the web (all the major browsers will only register service workers on the secure protocol), service workers are transformative when it comes to speed and performance. They underpin new approaches and ideas like Google’s PRPL Pattern, since we can maximize caching efficiency and minimize reliance on the network. In this way, service workers will play a key role in making the web fast and accessible for the next billion web users.
So yeah, they’re an absolute powerhouse.
Implementing a service worker
Rather than doing a bad job of writing a basic tutorial here, I’m instead going to link to some key resources. After all, you are in the best position to know how deep your understanding of service workers needs to be.
They key thing to understand — and which you’ll realize very quickly once you start experimenting — is that service workers hand over an incredible level of control to developers. Unlike previous attempts to solve the connectivity conundrum (such as the ill-fated AppCache), service workers don’t enforce any specific patterns on your work; they’re a set of tools for you to write your own solutions to the problems you’re facing.
One consequence of this is that they can be very complex. Registering and installing a service worker is not a simple exercise, and any attempts to cobble one together by copy-pasting from StackExchange are doomed to failure (seriously, don’t do this). There’s no such thing as a ready-made service worker for your site — if you’re to author a suitable worker, you need to understand the infrastructure, architecture, and usage patterns of your website. Uncle Ben, ever the web development guru, said it best: with great power comes great responsibility.
One last thing: you’ll probably be surprised how many sites you visit are already using a service worker. Head to chrome://serviceworker-internals/ in Chrome or about:debugging#workers in Firefox to see a list.
Service workers and SEO
In terms of SEO implications, the most relevant thing about service workers is probably their ability to hijack requests and modify or fabricate responses using the Fetch API. What you see in ‘View Source’ and even on the Network tab is not necessarily a representation of what was returned from the server. It might be a cached response or something constructed by the service worker from a variety of different sources.
Here’s a practical example:
- Head to the GatsbyJS homepage
- Hit the link to the ‘Docs’ page.
- Right-click – View Source
Now run a curl request for the same URL (https://www.gatsbyjs.org/docs/), or fetch the page using Screaming Frog. All the content is there, along with proper title tags, canonicals, and everything else you might expect from a page rendered server-side. This is what a crawler like Googlebot will see too.
This is because the website uses hybrid rendering and a service worker — installed in your browser — is handling subsequent navigation events. There is no need for it to fetch the raw HTML for the Docs page from the server because the client-side application is already up-and-running – thus, View Source shows you what the service worker returned to the application, not what the network returned. Additionally, these pages can be reloaded while you’re offline thanks to the service worker’s effective use of the cache.
You can easily spot which responses came from the service worker using the Network tab — note the ‘from ServiceWorker’ line below.
On the Application tab, you can see the service worker which is running on the current page along with the various caches it has created. You can disable or bypass the worker and test any of the more advanced functionality it might be using. Learning how to use these tools is an extremely valuable exercise; I won’t go into details here, but I’d recommend studying Google’s Web Fundamentals tutorial on debugging service workers.
I’ve made a conscious effort to keep code snippets to a bare minimum in this article, but grant me this one. I’ve put together an example which illustrates how a simple service worker might use the Fetch API to handle requests and the degree of control which we’re afforded:
I hope that this (hugely simplified and non-production ready) example illustrates a key point, namely that we have extremely granular control over how resource requests are handled. In the example above we’ve opted for a simple try-cache-first, fall-back-to-network, fall-back-to-custom-page pattern, but the possibilities are endless. Developers are free to dictate how requests should be handled based on hostnames, directories, file types, request methods, cache freshness, and loads more. Responses – including entire pages – can be fabricated by the service worker. Jake Archibald explores some common methods and approaches in his Offline Cookbook.
The time to learn about the capabilities of service workers is now. The skillset required for modern technical SEO has a fair degree of overlap with that of a web developer, and today, a deep understanding of the dev tools in all major browsers – including service worker debugging – should be regarded as a prerequisite.
4. Wrapping Up
SEOs need to adapt
Until recently, it’s been too easy to get away with not understanding the consequences and opportunities posed by PWAs and service workers.
Instead of criticizing 404 handling or internal linking of a single page app framework, for example, it would be far better to be able to offer meaningful recommendations which are grounded in an understanding of how they actually work. As Jono Alderson observed in his talk on the Democratization of SEO, contributions to open source projects are more valuable in spreading appreciation and awareness of SEO than repeatedly fixing the same problems on an ad-hoc basis.
One last thing I’d like to mention: PWAs are such a transformative set of technologies that they obviously have consequences which reach far beyond just SEO. Other areas of digital marketing are directly impacted too, and from my standpoint, one of the most interesting is analytics.