Boosting testing efficiency: how semantic HTML transforms End-to-End testing

by Stefania Mellai published on

Semantic and accessible HTML serves as a powerful tool, enhancing not only human interaction but also the efficiency of software systems. For instance, when users fill out forms with clear labels and accessible input fields, this reduces errors and ensures sending accurate data to the backend and databases. You may also have heard about how a well structured web page is fundamental for search engine optimization (SEO).
Creating crafts that are accessible and easy to use for everybody should be the top concern of every developer. The good thing is that prioritizing this aspect can also have a positive side effect: it enables seamless interactions between your website and other software components.
Let's explore in particular how to improve your testing processes and save valuable time.

The Connection Between Semantic HTML, Accessibility, and Testing

This topic was beautifully explored by Rita Castro, in her talk a11y and TDD: A Perfect Match. While Rita mainly focuses on unit tests, it's worth mentioning how these principles can be applied to end-to-end testing as well, and for the same reasons.

The End-to-End Testing Dilemma

End-to-end testing, goes beyond checking individual parts of a website. It tries to act like a real user in a real web browser, testing the whole webpage as it would appear to someone using it for real.

Let's imagine we are writing a unit test for a list item component, that produces the following HTML:

<li>
A description for this item
<button type="button">Pick this item</button>
</li>

And the unit test can target the button like this:

screen.getByRole('button')

This approach is great for testing the component in isolation, which is what "unit" means. It simplifies testing by not worrying about other possible buttons on the page and by ignoring context.

However, unit tests can't fully replicate real-life scenarios. For instance, they can't ensure that clicking the button saves some data for the current user and navigates to another page.

Now, consider our list item used in a full web page, with a topbar that contains buttons and other sections beyond the list container. The list in the page wasn't built with semantics and accessibility in mind, so it looks like this (taken from real production web application page):

<body>
<div id="root">
<div>
<div>
<div>
<div>
<div>
<!-- topbar -->
</div>
<div>
<div>
<header>
<div>
<h1>Page Title</h1>
<h3>Page subtitle</h3>
</div>
</header>
<div>
<!-- a form for filtering the list -->
<hr />
<div>
<div>
<div>
<div>
<div>
<p>A description for this item</p>
</div>
</div>
</div>
<div>
<button>
<span> Pick this item </span>
</button>
</div>
</div>
<div>
<div>
<!-- second item -->
</div>
<!-- ... -->
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>

Most end-to-end testing tools can record your actions as you interact with a webpage and then use those actions for automation. They save selectors to automatically identify the items you interact with.
We want to test the click of the button in the first element in the list. I have to confess I simplified a bit the HTML in the example, and the real selector produced by the tool looks like this:

//*[@id="root"]/div/div/div/div/div/div/div[3]/div/div/div[2]/div[1]/div[2]/button

Whenever you tweak your HTML, even adding or removing only one of the div wrappers, your test suite can easily get messed up in this situation. End-to-end tests aren't something you update every day because they are slow and need lots of resources. So, figuring out why a test failed can be time-consuming and take you away from more important tasks.

Testing tools are getting more and more better at identifing targets, but the algorithms are not bulletproof and can't react to page structure changes with 100% reliability. That's why it is important that targets can be recognized without ambiguity. One of the techniques more common to obtain this clarity is adding unique ids, test-ids or other selectors to the HTML elements, but it's not a best practice, for a bunch of reasons:

  • Separation of Concerns: Mixing testing concerns with your application's business logic violates the principle of separation of concerns. You are coupling two distinct and indipendent objects and a person working in one of the sides may be totally unaware that something is used and necessary on the other side. This person can also be the same person that didn't work on one of the parts for some time and lose the context in the meanwhile.
  • Maintainability: Test IDs may need to change or evolve as your application grows and changes. When you embed these IDs directly in your code, you may need to make widespread changes throughout your codebase whenever test requirements change. This can be time-consuming and error-prone.
  • Code Clutter: Embedding test-related identifiers in your source code can clutter the codebase and make it less readable for developers.
  • Accessibility: Adding test IDs actively misses the opportunity to incorporate accessibility testing for free. If my test suite explicitly looks for a button, it also ensures that the target keeps being an accessible button, and it's not refactored to unsemantic div later.

The Power of Semantic HTML in Testing

Now, envision a different scenario where your list is built with clean, semantic HTML. We don't use the automatic targeting algorithm and we add a manual selector for the first item in the list like this:

//*li[1]/button

Regardless of any HTML changes you make to the sourrounding structure around the list item, your test remains rock-solid. To recap:

  • Relying on automatic targeting algorithm may result in flakyness and, in best case scenario, you need to regenerate the target. If your tests require adjustements every time you change the underlying structure of your markup, they may not authentically simulate a user journey but instead are too closely tied to implementation details.
  • Using arbitrary ids or other unique selectors is a bad practice.
  • Having semantic HTML enables easy manual targeting that is more resilient against changes. It should also help detecting actions that may break the accessibility of your UI.

Why It Matters

So, why should you care about semantic HTML and accessibility in testing?

  • Developer and QA Bliss: Life becomes easier for developers and QA engineers. Writing tests becomes a breeze when you can work with straightforward, semantic code.
  • Mimicking Real Users: Your tests better mimic real user behavior. Real users don't look for divs and spans in your web application; they look for headings, buttons, links, and familiar interface elements with clear labels.
  • Reliability and Time Savings: Your tests become more reliable and won't break with every deployment for arbitrary reasons. Say goodbye to countless hours spent tweaking tests and hello to more productive work.

In conclusion, embracing semantic HTML and accessibility in your development process isn't just about adhering to best practices; it's a game-changer for testing. It streamlines your testing efforts, makes your tests more realistic, and ensures they stand the test of time—saving you precious hours and frustration. So, don't underestimate the power of clean code; it could be your testing superhero in disguise.

About Stefania Mellai

Building software is a very special kind of artisanship, in which you create something from nothing. Stefania is a software engineer from Italy, that makes impalpable crafts using React, HTML and CSS, with a special eye on accessibility and good UX.

Twitter: @smellai
GitHub: smellai