W3C Web Performance WG, 2 Mar 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: WebPerfWG call 2023 03 02 - A/B testing update

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Screen yep all.

B

Right, thank you. So this is sort of the second update about a b testing in this forum. I think our previous presentation was actually in uh 2022 March, which actually marks it about an exact one year. Recap and summary of updates and Amika from optimizely will be co-presenting with me. I will run through the updates, all of you and stuff, and then Mika will take over from the demo and complete the rest of the deck now.

B

I also want to acknowledge the time, effort and guidance that optimizely has been providing in this incubation, so I'm super appreciative of that.

B

So just to recap in case anyone is not familiar with the topic here. Client, a b testing usually refers to integrating experimentation.

B

Related changes in a browser and typical flow is that the the page that would like to AV test would have a script when the browser loads, the script, it will actually fetch the a b testing configuration from the provider and the scripts would actually modify the Dom and any make any changes to conduct the experiment in a sort of a blocking fashion, either through um the script itself blocking or depending on the placement of the script. It might actually hide the document.

B

There are a couple of approaches in the industry that are followed and either way, and once the changes are complete, the page will be unblocked um and that's typically, the fashion of a client-side, a b testing. Now this has the upside that this experimentation method is much more scalable compared to server-side experimentation, so which is the reason for the popularity of the method, a small scalable, primarily because the less engineering involvement required, which makes the method very accessible to marketing, Personnel or product management teams who doesn't have to change the code or redeploy the application.

B

uh There are some downside to it, which is the sub-optimal performance um typically incurred, due to the script being blocking the page and making the modifications and also the late nature of integration in the browser.

B

So um so we started out incubation of this approach with a prototype and a proposal which was which is linked on the deck and the link linked to the deck itself is actually in the agenda document of this meeting so um and that's sort of the recap now. Our problem statement is that how can we keep all the good parts of this client-side AV testing and minimize our eliminate the negative performance in back in at bits?

B

Point I'm, hoping that client, a b testing would be a a tool in the toolbox of any web engineer or a web application so that uh smaller tests and things that don't require architectural fundamental changes to the application that this would be a very viable method compared to how widespread it is already today. So, hopefully it is much more used.

B

So we started with some initial prototypes that involved using Edge or CD and compute nodes as an integration endpoint, and that was the last year's demo. The main reason and the highlight of this method is actually the edge allows, parallel Federation to origin and a b configuration which is sort of sequential once you're in the browser.

B

uh So this allows us to reduce the latency in required in fetching the AV configuration it makes it parallel, um and the second part of the solution was actually standardizing the language of the schema by which we can describe an ABC transforms, um which today, in the industry, it's mostly a combination of JavaScript and individual. You know provide the specific formats.

B

This standardizing would give us a lot of advantage in terms of you know the information being portable across providers and also that settle out advantages. We can discuss about that later.

B

The Prototype used mutation, Observer method to apply a b Transformers in a non-blocking fashion on the page, as the Dom is being created, the a b changes would be applied which basically converts a blocking time into a thousand paper cuts, if you may call it, but basic it reduces the performance impact to practically non-measurable, so as I described upside is that we could not measure performance penalty, which is a great thing.

B

Downside is that that prototype required everybody to front the origin with the CDN, which I think there was immediate feedback in the Forum as well, that it's a hard ask, and it's understandably so, because it's an architectural change.

B

So in the last one year we have worked together to standardize about 10, transform operations um and, and those are detailed in the external I'm not going to go through that right now. uh But there's. A short summary here is that there are about nine operations which are interoperable between browser and Edge, which means that the instruction can either be applied on the compute node or wherever the pre UA.

B

We call it pre-ua because it is the changes that are to be applied prior to the UA encountering the page or they can be pushed down to client and that decision could be done by The Entity. That's ordering the set of transforms, so it could be the a b testing provider or it could be somebody curating those tests, but we mostly expect the instructions to be created by a system, and it's not that human friendly by Design.

B

There is also a client only operation, which is a custom JS, and this is currently limited artificially uh to be a client only because we most platforms at least the ones we are prototyping with, which is cloudflare, does not allow arbitrary JavaScript execution. So this could change in future. But at this point it's actually a client only operation.

B

We are also working on some more operations, which is class list and style, attribute, manipulations and also relocating HTML element, which is sort of a corner case, but it's still required as well. You know optimizely a prior AV testing experience and something new we are bringing into the client.

B

Av testing is actually the HTTP header operations, which is a advantage over the systems we have today that now you can do performance tests or anything that requires a header manipulation um now the next step, and specifically for today's demo, what we wanted to do was actually make the edge optional, which means that we move the pre-usase of this standard uh into the origin, which is and and remove, the need for friending with a CDN, and that would actually create a more real worldish integration and also an incremental one.

B

So everything else Remains the Same we Federate the call to a b provider directly from the origin, so the time and the latency required to fetch the AV configuration would be parallel to constructing the page, um and everything else would just happen on the browser.

B

So again, upside is in our performance test. There is no measurable performance penalty with this approach either and partly because the origin time is also matching up. But the other thing is that the origin Foundation is an incremental change compared to funding in the crd.

B

Now next part is actually a demo and Mikhail will take over from here, but we thought it would be cool to wear the hat of a marketing team and come up with a campaign to be running on a real sort of real looking site. That actually is a back to work campaign, because most of the companies are actually calling their employees back to work. So, let's promote uh a back to our campaign with a specific product to be injected on the catalog page and also make some things on homepage.

B

I'll stop presenting next- and let me take over from here.

A

Okay, great thank you, Alex and I'll attempt to share right now.

A

Okay, uh can you see my screen all right, great, so yeah, so we decided to use um a site called umaticanbutton.us, which is a site that optimizely has built for kind of showing off the functionality of experimentation and whether it's you know, client AP uh testing using our snippet functionality, usually right here. It is built on the Shopify platform and it's intended to mimic a like normal-ish retail site, where you have a home page here, landing page and you can scroll down for Featured products and all sorts of interesting things like that and subscription ability.

A

There's uh like searching about us. We have a Blog and a catalog page. So it's navigable here and you can filter by by different types of uh of inventory and all that so there's uh retail sites are a big users of experimentation for a number of different reasons easily for um for experimenting, but also for personalization um types of reasons um like the back to work campaign and so to show what we have done with the um with our demo. Here is, for example, we have our uh we.

A

We are going through the cloudflare worker here in order to kind of front the attic and button Us site, and here we have a control version of it where all it does is well, it doesn't have any actual variations that it's applying onto the page and just returning the page as it it gets a formatic us and then returns it. It's uh it's navigable.

A

Also, as we can see, we've got a catalog page here, so this is our control um all going through um The Edge worker and now our experiment, which I said we wanted to do a back to work campaign, is here the experiment.

A

um What we've done is we've changed the background image to a different image. We've changed the the title in the homepage, we're ready to go back to the office and with the same set of Transformations that we'll show in a few minutes.

A

It also applies the a change where we inject a new item into the into the catalog page here at the front I'm going to make all that change like this is all all these changes are done on the client side, uh using our mutation Observer for as Dom is constructive and the the code is. Is there so that's that's one example.

A

We have um other examples of more comprehensive versions of tests like different kinds of functionality that we can uh that we can do but they're more that's more for testing the the functionality of the operations themselves.

A

This example is more along the lines of what I a real experiment might entail, or a real campaign from a mark 19 perspective would be.

A

And so this page is showing off uh the architecture of our demo in a couple different ways: um using a block diagram and also a web sequence diagram. But the the idea is basically that we go through.

B

A

Edge worker to Federate the the call to the origin use it as a we grab, formatican button.us and at the same time we grab from GitHub the transforms that we want to apply to the page. We can do that in parallel, but what the client receives at the end is simply the the final HTML, along with the variations to be applied in the uh in in scripts themselves.

A

So this could have been done directly from the origin without going through an edge worker as well, but the the architecture and the representation that we have developed makes it easy to. If you are going through an edge worker, you can, you can have more functionality in the edge, but all.

B

A

Then this is an example of the actual Transformations that we're applying on the page, the uh it's a list of Transformations, where we have three of them that we we've applied here. The format is on UA. The first um parameter here is where to apply it. We have on UA versus pre-ua, pre-ua being on, say an edge worker or at.

B

A

If you can handle it there, the second parameter is the selector in this case, for the hero image. On the page, you it's uh it's standard, CSS selectors, the third parameters, the operation to execute for the variation and the any any other parameters are Arguments for the operation itself. So uh in this case for the set attribute, we're changing the style attribute to be the background image to a different location for the background image and then we're also changing the hero.

A

Title we're executing to custom JavaScript, in this case, to change the inner HTML to be ready to go back to the office. This could also be done through a different operation, but we wanted to show off the custom JavaScript capability and, finally, the third transform that we have is selecting the specific product um where the item in the catalog page and replacing the entire HTML or H entire HTML with a new div.

A

B

A

We did a few uh performance tests with uh the control versus the demo. We ran a few uh web page tests, comparisons uh for one versus the other, there's a little bit of variance in the in the results themselves, ranging between on the median for between like 2.0 and 2.4 seconds, but uh overall they seem to be pretty comparable um on both uh the home page.

A

There's, no, no real performance impacts that we can see as well as on the catalog page, and there are links to more to the web page test results down at the bottom here if you're interested- um and we also ran additional performance benchmarks. So with web page tests it usually only runs three tests. You know by default um those three loadings of the page and uh it runs on on real websites or real devices, which is excellent.

A

But one thing that we found is that there is significant variance in these kinds of tests, so we have we um executing using a different mechanism.

A

The way that we ran these trials was using uh puppeteer, which is a library that lets you execute headless, Chrome, to load a page, to record the performance timings and, to you know, shut down and we're running this in an AWS Lambda function and just in parallel running 500 trials, each for the different pages that we wanted to test again.

A

So it's a home page control versus a home page back to work campaign and catalog, page control and catalog page back to work campaign where this column is the median of the onload event time in milliseconds for all the 500 trials for each of those pages, and we also wanted to highlight that the median Delta between the two is is very small.

A

Only 12 milliseconds were added uh for about 12 milliseconds for the execution and the application of the variations for the homepage and only 19 on the catalog page, and uh these These are pretty similar to what uh some existing.

A

A b experimentation like to to optimizely snippet uh the time it takes to execute you well um so yeah. Overall, the point was that it we are.

B

Not seeing any measurable.

A

Performance impact in our results on the demo pages um yeah. So so that's where we're at and we have um several we. We have a lot a lot of discussions, lots of things that we want to work on and improve on and and cover in the next year or so um so, yeah work in progress and what's next one big thing is to uh improve our prototyping and performance testing Suite, make it more robust and more uh consistent and and um and stable more.

A

You know less variable, so we can really highlight where and when there are performance impacts to the page and using our representation.

A

We are continuing to work on a few additional operations that we want to investigate, like um the header uh changes and uh like the redirect things like that, and also fine-tuning some of our existing operations based on how we are using them in our in our tests, um so we're also working on other platform and framework specific Integrations like different Origins and how they would actually work in the wild.

A

One. Big thing that we also have to cover is uh incremental transformation, fetch or dynamically uh receiving or obtaining new Transformations as a visitor, navigates the site or goes you know through through a site, uh especially during the life cycle of an of a single page application. And that's that's one big thing, because that unlocks a a large part of experimentation which is uh called like behavioral experimentation, so that if a user goes and uh does X Scrolls this far or of the users that click this button, for example, um do we get better results?

A

If we change- and you know, do X versus y as a result of that so um and we don't know that just on the initial load of the page, so as the as a visitor navigates, the site, we need to be able to dynamically uh obtain additional transformations.

A

The we also- and this this is a very exciting possibility, as well, is um exploring browser native implementation of the applicator. It's we are we're looking into um bundling the actual applicator into different languages. That can just be kind of you know, bundled and shipped off to in different different ways and having that also implemented in the browser specifically, would um hugely help in terms of not needing to include that in the payload of every and every time, you're executing a an experiment on a page, and that would that would help with performance uh tremendously.

A

We expect um also exploring uh performance trade-offs with directives like the 103, early hint um directive and other things.

A

um And finally, we are um so the the transform language itself and applying variations is is certainly a you know, a vital and crucial component of running experiments, but there are just by itself it. You can't just run an experiment and avoid and expect to get results. You need additional information um like metric collection about what what a user did and how they, what the results are from your experiment in order to actually um to act on the results.

A

So we are exploring that, along with the SBA and dynamic decision making that happen uh during during a full experiment, to try to see how to ensure that the language representation will kind of cohesively live along with an actual running experiment in the wild.

A

um I think that's! That's all I have.

B

Cool, let me turn off, recording and.