The sci­ence of goo d de­sign

What is good de­sign? Per­haps num­bers, not cre­ativ­ity, hold the an­swer. Michel Fer­reira takes a deep dive into A/B test­ing

net magazine - - CONTENTS -

Michel Fer­reira ar­gues that data is the key to great de­sign in this deep-dive into A/B test­ing

You’ve prob­a­bly heard or read about A/B tests. How­ever, like most things we di­gest online there’s a lot of mis­in­for­ma­tion out there. In this ar­ti­cle, I’d like to take the time to look at best prac­tices in A/B test­ing, com­mon pit­falls and how ex­per­i­men­ta­tion can fast-for­ward your skills to the next level.

Let’s for­get ev­ery­thing we al­ready know and start from scratch. An A/B test is a ran­domised com­par­i­son of two ver­sions of a web­page. That means 50 per cent of your traf­fic is ran­domly pre­sented with ver­sion A of a page and 50 per cent with ver­sion B. We use a con­trol (ver­sion A), to com­pare with a vari­a­tion (ver­sion B), which we an­tic­i­pate will have an ef­fect on any spe­cific met­ric. The met­rics can be any­thing from con­ver­sion rate or time spent on the page to the num­ber of clicks or how long it takes to com­plete a task.

To il­lus­trate this, I’ll start with a sim­ple ex­am­ple. A land­ing page con­tains

a blue call to ac­tion. This is the con­trol against which our test will be mea­sured. On the vari­a­tion, the call to ac­tion will have been changed to green. An A/B test will run, af­ter which the data will show us if ei­ther ver­sion has a neg­a­tive or pos­i­tive ef­fect.

How­ever, if you ran this test with no hy­poth­e­sis, and tried to ob­serve the data, you’d see dif­fer­ences in lots of met­rics. Un­for­tu­nately this won’t help you prove any­thing.

The best way to get good, re­li­able re­sults is to de­velop your ex­per­i­ment with the ex­act met­ric you’re tar­get­ing in mind. In this case, your hy­poth­e­sis could be that you ex­pect the green but­ton to re­ceive a higher per­cent­age of click­throughs than the blue. Let’s dig into this even fur­ther.

De­sign of ex­per­i­ments

I work as a de­signer at Book­ing.com, where I some­times joke our de­sign­ers should be called ‘de­sig­nen­tists’. That’s be­cause we be­lieve in test­ing ab­so­lutely ev­ery­thing that we build. We do this through some­thing called ‘de­sign of ex­per­i­ments’ (DOE). This is a sys­tem­atic method to de­ter­mine the re­la­tion­ship be­tween the fac­tors af­fect­ing a process and the out­put of that process.

In other words, it is used to find cause­and-ef­fect re­la­tion­ships. Us­ing DOE, we use ex­per­i­ments to test an idea and to make sure their ef­fect is not caused by chance or ex­ter­nal fac­tors.

Since we’re not data sci­en­tists, you may think DOE sounds too com­pli­cated. But any ex­per­i­ment can be de­signed just by fol­low­ing these five steps: 1 Make ob­ser­va­tions 2 For­mu­late a hy­poth­e­sis 3 De­sign and con­duct an ex­per­i­ment to test the hy­poth­e­sis 4 Eval­u­ate the re­sults of the ex­per­i­ment 5 Ac­cept or re­ject the hy­poth­e­sis Let’s break it down and look at each of these steps in more de­tail.

Make ob­ser­va­tions

We start by ob­serv­ing user be­hav­iour, ei­ther dur­ing user re­search or by look­ing at the avail­able data col­lected by our web­site. You can re­view his­tor­i­cal data to see trends in your cus­tomer’s be­hav­iour, or look at Google An­a­lyt­ics and any other tool you have at your dis­posal. Try to iden­tify the user’s pain points or any­thing you be­lieve could im­prove their over­all ex­pe­ri­ence.

For­mu­late a hy­poth­e­sis

Hy­pothe­ses can be sim­ple ideas like ‘If we mod­ify the copy on the Regis­ter but­ton, we ex­pect more users will cre­ate ac­counts be­cause of how much

If you ran a test with no hy­poth­e­sis, you’d see dif­fer­ences in lots of met­rics, but they wouldn’t help you prove any­thing. De­velop your ex­per­i­ment with the ex­act met­ric you’re tar­get­ing in mind

sim­pler it is to un­der­stand the new mes­sage’ or ‘If we in­crease the size of a but­ton, we will get more users to com­plete their pur­chase be­cause it will im­prove read­abil­ity.’

You can ex­per­i­ment with any­thing you like, as long as you can mea­sure it. So how about a tech­ni­cal im­prove­ment? ‘If we re­move the ex­tra im­age calls on a page, we will re­duce load time, drive users down the shop­ping cart faster, and in­crease con­ver­sion.’

When for­mu­lat­ing ideas, it’s im­por­tant to have a clear rea­son for the change. Your best bet is to test ‘SMART’ ques­tions: those that are sig­nif­i­cant, mea­sur­able, achiev­able, re­sults-ori­ented and time-bound. With SMART ques­tions, you’ll get bet­ter an­swers. And those will be very im­por­tant in the end.

De­sign and build your idea

You have to pub­lish and run your test. I won’t teach you how to de­sign, but I will point out that ex­e­cu­tion could make or break your ex­per­i­ment. Choos­ing some­thing you can mea­sure, with a high prob­a­bil­ity of im­pact, can re­ally make a dif­fer­ence here. If you start by ac­cept­ing that most ex­per­i­ments fail, you’ll be able to per­form more tests, faster, and learn from your fail­ures. Iter­ate, rinse, re­peat.

Eval­u­ate the re­sults

the sam­ple size, the less time there is to achieve con­fi­dence on the sta­tis­ti­cal sig­nif­i­cance of your hy­poth­e­sis. If there is a small ef­fect (say a 0.1 per cent in­crease in con­ver­sion rate) you will need a very large sam­ple size to de­ter­mine whether that dif­fer­ence is sig­nif­i­cant, or due to chance. Larger ef­fects can be val­i­dated with a smaller sam­ple size.

But here’s a curve ball. You’ve checked your num­bers, and ev­ery­thing in­di­cates you only need one week – let’s call that a busi­ness cy­cle – to achieve your re­sults with the nec­es­sary con­fi­dence (you can use online cal­cu­la­tors to de­ter­mine that: netm.ag/cal­cu­late-286).

But the cal­cu­la­tor looks only at num­bers. Then I ask you this: Can you re­mem­ber what hap­pened this Fri­day? Now com­pare it to Fri­day 5 Au­gust 2016, first day of the Sum­mer Olympics in Rio. Do you think your web­site’s cus­tomers’ be­hav­iours will be the same? Short an­swer is: No.

Users’ be­hav­iours are af­fected in un­ex­pected ways, by planned and un­planned events. And be­cause of that, you should run your ex­per­i­ments for at least two full busi­ness cy­cles. That way, you’ll not only get a big­ger sam­ple but you’ll also cover your bases if some­thing com­pletely un­ex­pected hap­pens on the week you run the test. Don’t stop the ex­per­i­ment be­fore its cy­cle is com­plete, and al­ways run it for full cy­cles (two full weeks or months).

For valid ex­per­i­men­ta­tion, you also need to make sure you run both ver­sions of your page at the same time, with ran­domised users, and not ver­sion A for 100 per cent of the traf­fic and then B for 100 per cent of the traf­fic. This would mean you were test­ing your so­lu­tion against two dif­fer­ent user bases, and not get­ting real re­sults.

Data does not speak

When de­vel­op­ing your ques­tions, con­sider how you will mea­sure their suc­cess. Be­cause in most cases, when the time comes to an­a­lyse the data, the an­swers won’t be de­scrip­tive. All you’ll see is ‘yes’, ‘no’ or ‘good­bye’ (in­con­clu­sive re­sults).

Ask your­self, can your ques­tion be eas­ily an­swered by these re­sponses? For in­stance, let’s say you ask: Would a ham­burger menu icon work bet­ter for my web­site than the word ‘menu’? Re­view­ing the data you see ‘No’. Can we make the ques­tion bet­ter so the an­swer is eas­ier to un­der­stand? Let’s re­for­mu­late the same ques­tion us­ing the SMART for­mat and add some mea­sur­able goals.

How about this: Based on col­lected data, we be­lieve the ham­burger menu icon could be bad for our users and hin­der the proper nav­i­ga­tion of our web­site’s sec­ondary ac­tions. We will test this by as­sum­ing that a new ver­sion with the word ‘menu’ in­stead of the icon would be eas­ier to un­der­stand and im­prove the over­all menu en­gage­ment (clicks on menu and clicks in all links in­side the menu). We also ex­pect the over­all num­ber of users that fin­ish their ho­tel reser­va­tions to be im­pacted. This will run on our mo­bile web­site for two weeks be­fore we make a de­ci­sion.

Yes, no, good­bye

Time to look at our data and ac­cept or re­ject the hy­poth­e­sis. Re­view the ques­tion and see if the an­swer is now ob­vi­ous. Does it show an im­prove­ment or pos­i­tive dif­fer­ence? Was there an ob­vi­ous neg­a­tive im­pact on the met­ric you were aim­ing for? Or does it sim­ply show no sig­nif­i­cant re­sult, leav­ing the ex­per­i­ment in­con­clu­sive?

A com­mon mis­take is to be­lieve that an in­con­clu­sive re­sult means there is a neu­tral ef­fect and there­fore to con­sider it an ac­cept­able change. Be­ware: the fact that you can’t see a vari­a­tion doesn’t

mean the fea­ture is ac­cept­able or bet­ter for your users. It just means you can’t mea­sure the im­pact of what you’re test­ing. You will ei­ther want to re­view the so­lu­tion and see if there are other ways to solve the prob­lem, or aban­don the idea en­tirely.

In some cases you’ll see a dif­fer­ence in met­rics that are not your pri­mary fo­cus. Try to un­der­stand why this is hap­pen­ing. If there’s a neg­a­tive im­pact, is it ob­vi­ous why? For in­stance, let’s say you tried to im­prove sales by mov­ing the dis­claimers into a new tab on your prod­uct page, but then saw a huge num­ber of can­cel­la­tions or prod­uct re­turns.

Or is it some­thing you can’t place? Let’s say you mod­i­fied a nav­i­ga­tion item on the header of the web­site, and now users are fill­ing out re­view forms more of­ten. In this case, your de­sign in­tu­ition will be key to un­der­stand­ing what the data is say­ing. It’s like try­ing to un­der­stand what your users are say­ing just by look­ing at their body language.

More im­por­tantly, don’t ac­cept any pos­i­tive re­sult just be­cause it is pos­i­tive, es­pe­cially if it is not ob­vi­ous or re­lated to your hy­poth­e­sis (‘There is no such thing as magic, Harry’). Sure, you want your re­sults to be pos­i­tive, but more im­por­tantly you want them to be true.

To con­ver­sion and be­yond

At Book­ing.com we op­ti­mise our web­site in small steps. But not be­cause we want to ob­sess over ev­ery small de­tail; we want to have mea­sur­able steps that, when val­i­dated, will lead our prod­uct to be­come bet­ter. Rather than im­prov­ing one thing 10 per cent (which is re­ally dif­fi­cult in a high-per­form­ing web­site), we go out and find thou­sands of things to im­prove a frac­tion of a per­cent. This is achiev­able and much sim­pler.

Don’t try to op­ti­mise more than one thing at a time. This not only fails to pro­duce re­sults, but when it does, it is im­pos­si­ble to know why and to learn from it. Say you ran a test in which you added an im­age and changed the colour of the but­ton at the same time, and this gen­er­ated pos­i­tive re­sults. You won’t know if it was the im­age or the colour change that cre­ated the ef­fect, mak­ing it im­pos­si­ble to learn from it.

Con­sid­er­ing most tests fail, if you had a neg­a­tive re­sult would you be able to say for sure that users pre­fer pages with­out im­ages and with­out blue but­tons? As Colin McFar­land says: De­sign like you’re right. Test like you’re wrong.

Find­ing value

Why is test­ing the key to good de­sign? Be­cause ‘good’ is sub­jec­tive in any case, but worse than that is try­ing to de­fine ‘good de­sign’. De­sign­ers dis­agree on a num­ber of things. Is Hel­vetica a good font or not? Don’t get us started on Comic Sans. These dis­cus­sions are never-end­ing and no one is right.

A/B test­ing takes opin­ions out of the equa­tion. You come in with an idea; an ed­u­cated guess of what you be­lieve is good for the users of your web­site. And the users show you the an­swer. It’s a democ­racy of good ideas. Ideas that you be­lieve add value to your cus­tomer base and that they have the chance to ac­cept or re­ject.

Your job then is to use your de­sign and prob­lem-solv­ing skills to keep mak­ing your ideas bet­ter. The en­tire goal of your de­sign process shifts to find­ing bet­ter so­lu­tions to cus­tomers’ is­sues, and refin­ing those to give them the best ex­pe­ri­ence pos­si­ble. Doesn’t sound easy, does it? But who said good de­sign was meant to be easy?

Don’t just ac­cept any pos­i­tive re­sult, es­pe­cially if it is not ob­vi­ous or re­lated to your hy­poth­e­sis. Sure you want your re­sults to be pos­i­tive, but more im­por­tantly you want them to be true

A/B ba­sics A/B test­ing is a ran­domised con­trol test, where 50 per cent of your traf­fic is pre­sented with a vari­a­tion to test if that change has any mea­sur­able ben­e­fit

Facts If this change had been im­ple­mented with­out us­ing an A/B test, you’d never know if it had any ef­fect

Build­ing a hy­poth­e­sis My hy­poth­e­sis is that in­creas­ing the size of a but­ton will get more users to com­plete their pur­chase be­cause it will im­prove read­abil­ity

Hosted op­tions If you’re look­ing for a quick start into the world of A/B tests, there are some great hosted op­tions avail­able

Menu styles We ex­per­i­mented with us­ing the word ‘menu’ ver­sus an icon to see which pat­tern suits our users

Just one change If we just add an im­age to our land­ing page next to the call to ac­tion, and see a dif­fer­ence, we’ll be able to con­fi­dently at­tribute it to our change

Two changes Would you be able to say with con­fi­dence that users pre­fer land­ing pages with­out im­ages or green but­tons?

Newspapers in English

Newspapers from Australia

© PressReader. All rights reserved.