When William Sealy Gosset invented the A/B test more than a hundred years ago, he had no idea that he had created the perfect test for Internet businesses.

The A/B Test dates back at least to 1908, when William Sealy Gosset, writing as “Student”, developed his t-test to identify the best barley for brewing beer at Guinness. His methodology allowed for a statistically rigorous comparison of two groups to determine whether there was a significant difference between them. While this surely helped to keep the quality high at Guinness, it also created a powerful tool for everyone from audio engineers to direct mail marketers to determine what works and what doesn’t.

But A/B testing has really taken off over the last couple decades. Companies have recognized that the Internet is tailor made for A/B testing: changes to apps and websites are quick and cheap, and the impact can easily be quantified in clicks and sign-ups.  

A/B testing has been used to test everything from website design to free shipping offers to the attractiveness of lingerie models. It’s likely that the engineers of your favorite websites A/B test parts of those sites every day—and that how you act on the site is a piece of data for them. There’s even a Google Chrome extension that lets you identify when you are part of a test.

Some businesses find that minor design changes, like changing a website’s background color or the size of a button, can influence customer behavior dramatically. Google famously tested 41 different shades of blue for their toolbar. While this may seem like overkill, small differences in click-through rates can mean big dollars for a high traffic website.

Our Data on A/B Testing

In the course of helping a diverse range of businesses conduct A/B tests on their websites, Experiment Engine has accumulated a wealth of data on how people use this statistical test. To understand what kinds of tests companies are conducting, and which ones show real impact, we looked at a dataset consisting of A/B tests conducted in late 2014 and the first half of 2015.

We found that the most common A/B tests are design changes (such as changing a color or the placement of a button) and copy changes (such as testing different text above a button). While design changes are the most common, we also discovered that they are, on average, less impactful than changes to the site’s headlines and written content.

When we examined design changes, we found that tweaks to the placement and arrangement of key elements, especially those that simplified and streamlined the site, tend to show the greatest impact.

What types of A/B tests do companies conduct?

There are two main types of A/B tests that our customers conduct: design changes and copy changes. Copy changes test the content or phrasing of text on the site, whether titles, headlines, paragraphs, or even the “Click Here to Learn More” language on a button.  

Design changes include changes to the color scheme and layout of the page, adding or removing elements like images and buttons, and changes to the functionality of the page.

The table below shows which types of changes are most common.

Screen Shot 2016-07-21 at 4.27.47 PM

A majority of tests involve design changes, though a significant minority included at least some changes to the website’s copy. The “Other” category is primarily made up of policy changes like free shipping offers and changes to return policies.

We work with a variety of websites, but the majority of them are eCommerce, lead generation or content companies. We were curious about how the mix of tests differed between these types of websites.

Screen Shot 2016-07-21 at 4.30.09 PM

Design changes predominate across all types of websites, but they are especially favored by Content sites. Perhaps these media-focused sites have confidence in the specific content they wish to display—and are focused on finding the most compelling ways to display it.

Lead generation websites, which sell customer contact information to other businesses looking for sales leads, are unusually likely to focus on testing copy changes. It may be that getting a user to enter their contact information requires a clearly articulated value proposition more than effective design.

The Effectiveness of Different A/B Tests

So do A/B tests actually work? We found that 21.5% of all the A/B tests (for which we have data) showed some statistically significant impact. While this may seem low, most companies that work with us test multiple factors at once, so having roughly one in five show an impact is actually quite good.

Which types of tests most often garner significant results? The chart below shows success rates for each type of A/B test.

Screen Shot 2016-07-21 at 4.31.22 PM

With a 26% success rate, changes to the website copy tend to be the most successful. Perhaps Lead Generation websites are on to something. It also appears that making changes to both design and copy in a single test may be a bad idea.

We also wanted to explore whether incremental changes (like Google’s test of 41 shades of blue) or more radical overhauls tend to yield more successful tests.

Screen Shot 2016-07-21 at 4.32.57 PM

Clearly, radical changes tend to be more successful. This may be because they are more likely to have a clear hypothesis behind them, while incremental changes are more speculative.

When companies test “design”, what are they testing?

Finally, since a large proportion of our customers’ A/B tests were design changes, we dug into the exact types of design changes companies are making. The table below shows the most popular categories of design changes.

Screen Shot 2016-07-21 at 4.34.11 PM

By far the most common design change is the addition or removal of website elements like images, banners, and tabs. Color changes are next at 18% of all design changes, and changes to the placement of elements is also fairly common. Changes to forms, buttons, and website functionality are rare.  

We also examined which of these specific design changes are most effective.

Screen Shot 2016-07-21 at 4.35.21 PM


Placement and button layout stand out as the most impactful design changes. Both of these changes likely help visitors navigate sites more effectively, so it’s not surprising that they yield the greatest results.  

Changes to color, elements, and functionality are less impactful, with success rates hovering around 17%. Changes to the layout of forms—where visitors enter their contact information—were the least successful. It may be that people are not dissuaded by minor design issues once they decide to fill out a form.

Good Design Works

Overall, the data suggests that A/B testing is most effective in testing fairly radical changes to websites’ copy and overall layout. Design changes focused on optimizing the placement of elements and streamlining the user experience may yield better results than tweaking color patterns.

As a web user, this is good news. Companies see the best results when they improve your experience on their site and write clear and compelling copy. You probably care a lot more about a site being easy to use than whether the website uses the perfect shade of blue.

About the Author

EJ Lawless - Co-Founder & Head of Growth at Experiment Engine. Previously Senior Director of Online Marketing at Indeed.com. In addition to statistics, online marketing, and a/b testing, EJ also loves the Texas Longhorns, AS Roma, quantified self, and most things Texas. Despite no musical inclination, he is sometimes known as "DJ Flawless".
  • Anne Holland

    Great article, EJ!

    Also wanted to mention, for anyone interested in A/B case studies, WhichTestWon.com is a fantastic resource.

  • http://www.contentquo.com ContentQuo

    Great stats, thanks for sharing. Amazing to see that much fewer companies experiment with copy than with design. Just imagine the opportunities that copy presents especially for globalized, multilingual web properties!

  • Barry

    Great post on this. I’d like to share the process we used in marketing at Netflix for the eleven years while I was there to great success.

    Basically, these are two approaches that can take place simultaneously.

    The first is the standard “make a minor changes” and see what incremental lift you get.

    The second, and in my opinion a bit more powerful, is to vet your A/B candidates first. You start by forming hypotheses based on either an observed behavior (e.g. a feature that tested well is not performing well in real life), known issues (e.g. theme of landing experience hasn’t changed as market conditions have), or business needs (e.g. need improve conversation). Then use qualitative research to try big ideas to address the hypotheses. These solutions should be as far from each other thematically as possible. From that you find which ideas are DOA. Cull them, refine the ones that showed promise based on observation and feedback. Then take those candidates into A/B testing. This approach means your A/B candidates are already optimized and primes to moved the needle significantly.

    Obviously the second approach takes more time and is costlier but is a great way to make big shifts leveraging A/B testing.

  • Hanne Van den Berghe

    How many emails woud you have to compare with each other to be representative?

  • Chris McCarthy

    Great post! Interesting to see how the various content types can be leveraged.

  • https://www.analytics-toolkit.com/ Georgi Georgiev

    Nice summary, though it’s likely biased due to the type of tests your system allows. For example, redesigning complete processes or testing backend changes like ranking or ordering of items is likely not something your customers were able to do.

    It would also be good to known not only which test variants produced the most dramatic lifts (assuming nominal statistical significance is actual and not biased by common errors), but to have some kind of range or variance for these values. As you know, averages can be highly misleading :-)