How to Maximize the ROI of your Content With Content Testing

Tell us, do you read long articles?

Do you watch long YouTube videos? Do you listen to podcast episodes for over 30 or even 60 minutes?

I often get asked if anyone even reads my articles because they are so long, or if am I actually wasting a lot of time working with them. Just replace the word “long”. sometimes with comprehensive, profound, or relevant. Not because they’re synonyms, but because then we’re talking about quality instead of just quantity. And then the question sounds quite different, doesn’t it? I also like to distinguish between having read and understood.

The latter is even more difficult to achieve.

How long a blog article or a text should be or how short a video has to be so that In my opinion, whether people consume this content is not the decisive question. Much more important is how much “scope” is necessary to reach the goal – please think about your “business goal” as well as that of your target group – and how you manage to design your content in such a way that people want to consume it.

The answer to this is so individual that it cannot be generalized. Even if many supposed experts like to do it. It’s not working. Of course, there are recommendations from platforms and best practice experiences from market players, but they are anything but universal. It’s better if you find out for yourself what works and what doesn’t: through systematic testing based on very specific hypotheses.

And that’s not all in terms of volume, but the effectiveness of your content in every respect.

I am writing this article about “Content Testing” because I am convinced that many questions in content marketing can be answered much better through targeted experiments – because more individual and data-based – get the answer. We just have to get used to actively asking these questions and always questioning the answers.

In this article you will find a total of 11 content-testing examples, including the following questions:

Does the visual design of texts affect the achievement of goals, for example, the newsletter subscription rate? (for example)
Is the length of a text crucial? (for example)
How can I design long texts to motivate users to read them in full? (for example)
Which headline is the best? (for example)
Which cover photo works better? (for example)

Experiments in content marketing, why not?

Experimentation is a tried and tested way of measuring the effectiveness of your content. The knowledge gained from this will help you with production, distribution, and, above all, the continuous optimization of your content portfolio. A/B tests in particular are important to find out which content works, which doesn’t, and above all why. So how do different elements such as headlines, formatting, and visual design or different types of content affect traffic, customer behavior, and the conversion rate?

In e-commerce, testing is almost common practice, especially in the area of usability and other user experience (UX) tests. Amazon, Shopify, Booking, and Netflix – all benefit from testing. But the content (in marketing) can also be excellently tested, for example, to validate assumptions about your own target group, to formulate texts that actually resonate positively (brand promise, unique selling points, sales arguments, etc.), or to determine the headlines for blog articles and ad texts with the most clicks.

“When it comes to testing, people love focusing on UX, design, layout, and call-to-actions, yet they often neglect copy. In my opinion, you should focus on nailing your value proposition, everything else is just window dressing.”

The Strengths of Content Testing

In two cases in particular we can develop effective solutions through experiments: On the one hand, when we do not know what to do, for example, because we lack information, and on the other hand if we do not know or cannot estimate what effect an intended change will have on a goal. Or in short: Through testing, we can know the what? how? and most importantly why? answer.

Whether blog articles, e-mail marketing, social media, copywriting, visual design, landing pages, or search engine optimization – quantitative and/or qualitative tests can be sensibly integrated almost anywhere. There are a number of variables that we can test: title and meta title, the description of structured data, the length of the article, the introductory paragraph, or the integration of image and video content.

For example, if this is an organic search, we should probably test the meta title, the inclusion of rich media, and the addition to the FAQ schema. On the other hand, if it’s a paid distribution via content discovery networks like Outbrain and Taboola, we better focus on title/headline iteration and article length and template.

Content testing on “value pages” is particularly exciting, i.e. on those websites that you use to boost your business – be it through traffic generation via blog articles, product communication via product pages, or acquisition via contact pages. Just use your web analytics tool to see which pages are accessed particularly frequently or have a particularly high conversion rate.

We divided articles into three broad categories; Performance, Social, and Organic. While a lot of companies and agencies will probably focus on the latter two, it was the performance articles that really brought in the traffic, the leads, and the sales, and were the focus of our optimization efforts. At CareerFoundry, organic – and particularly organic search – content is integral to the company’s success. Therefore, we spend much more time testing there than in content discovery.

The different types of testing

Many of the following examples are based on A/B/n Tests, in which different variants are compared in randomized segmented target groups. This happens dynamically on one and the same page using appropriate testing tools, or, in this case, we speak of Split Testing, through the targeted distribution of traffic to different pages.

There are also other types of testing. For this article, for example, We have a so-called “Preliminary Testing ” (Pretest for short). This form of testing is typical for TV advertising before the broadcast is shown to a small audience first to avoid gross blunders and to ensure that the advertising message will really come across.

Content testing: step by step to the first experiment

Do you have promising pages or maybe whole page types (e.g. “blog article”, “Pillar Pages” or “Service Pages”), you can get started.

Specify your problem: Due to which observable, ergo measurable symptoms do you suspect potential through optimization? This could be, for example, an increasing bounce rate as shown in the screenshot below, or a low or decreasing newsletter subscription rate.

We should always take a closer look at above-average and, above all, increasing bounce rates (Screenshot: Google Analytics)

Identify test fields by User Research, for example through jobs-to-be-done interviews, copy testing, or usability tests. Such qualitative research methods are important because only they provide information about why e.g. the bounce rate increases.

While analytics give you metrics on how a page or its elements are performing, it doesn’t give you any input on what exactly is resonating with your audience and what creates friction. Based on this open-ended feedback, we can formulate a hypothesis for A/B testing. You can’t optimize copy without qualitative user research.

An often neglected step

Revert the changes (optional). If you want to know exactly, you should validate by restoring the control variant after completing a test whether the change in the goal metrics was really caused by the changes to the content/design. In that case, the numbers should return to pre-test levels. This would really validate the hypothesis and you can roll out the new variant again without hesitation.

When experimenting, always be open to unexpected results, often the reasons for a change are not what you anticipated!

A problem can be solved in different ways. It is therefore important that you test individual solutions one after the other. It is best to prioritize your hypotheses in advance based on the expected business impact (e.g. percentage increase in the conversion rate) and the associated effort for implementation.

This way you ensure that you optimize the most important aspects and benefit from content testing right from the start.

Important: Prioritize your ideas!

I don’t want to make a big deal, because this task is not as trivial as it sounds, and I dedicate it to the topic in I’d rather have my own article in the future. But I want to at least create awareness that the potential of individual ideas varies and that we should invest our resources in selected optimization projects in a very targeted manner. You may be familiar with frameworks such as ICE, PIE, or the Moscow method, but I also find Speero’s “PXL Framework” exciting. Viljo Vabri explains it like this:

This framework helps us to measure signal strength based on how many different data points we have to support a hypothesis, combined with an “ease of implementation” meter. With objectivity at its core, this model requires data to impact the scoring. It breaks down what it means to be ‘easily implemented, and instead of taking a shot in the dark regarding the potential impact of your idea, this model quantifies the potential.

I find it exciting because it can be adapted for different content use cases, such as for SEO or lead generation. You can find details and a template for this here in the Speero Blog.

Let’s now look at a few examples and empirical values. This will certainly provide inspiration for your own experiments.

Examples and inspiration for goal-oriented and data-driven content testing

In our book Content Design, Ben Harmanus and I describe many ways to visually design texts – by using headings, paragraphs, lists, line spacing, etc. These are all proven design tools, Let text be more harmonious but please never equate it with more effective. You will only be able to answer this statement objectively with appropriate tests!

Does the visual design of texts affect the achievement of goals, for example, the newsletter subscription rate?

Yes, the optics can affect the conversion rate both positively and negatively.

Is it worth revising texts for the sake of good language?

Not in 75% of the cases tested. As long as the quality of what is written is acceptable, there is often no reason to beautify the language because this does not create any additional value for the user.

Say what you have to say in a way that everyone understands. A good text does not necessarily need more.

Does the length of a text matter?

So there is the question every content marketer asks sooner or later. We had the argument for short texts in the previous example: hardly anyone reads everything but instead scans extensive texts in search of relevant key terms. But the counter-argument is simple: what if users want more information or you simply need more information to explain a product?

The answer is in a way just as simple and through appropriate tests can even be objectified:

As long as more text also means more value for the user (ergo is relevant), the Length doesn’t matter.

How can I design long texts to motivate users to read them completely?

As the tests from the previous example show, the relevance is decisive for the basic interest of the users. For James Flory, Director of Experimentation Strategy, and his colleagues at WiderFunnel, however, this first test was the starting point for further experiments – with graphics within the articles, visually contrasting quotes, info boxes, and other content elements, which on the one hand add value for users: inside, but at the same time loosen up the text and make the “content experience” more varied. Cumulatively, they have generated a sales uplift of 26% to date. Iteratively, long-form content can become more and more effective – and with the help of targeted experiments, we also learn why.

Iteratively, long-form content can become more and more effective – and with the help of targeted experiments, we also learn why.

Which headline is the best?

The question is valid, but at the same time we should also ask: when is the title the only thing that needs to motivate a person to click, for example in a search or on an overview page, and when can you provide more context with a supplementary post text?

It’s no secret that the New York Times, for example, tests the titles of its articles. However, it is exciting to see how many articles she tests, how many variants she tests in each case, and how successful she is with it. Tom Cleveland, the software engineer at Stripe, analyzes exactly that with his NYT tracker. A pattern already apparent from this analysis is over time – unfortunately – headlines become more and more dramatic.

Overall, the data shows that tested items are 80% more likely to end up on the most popular item list. In addition, the number of tests correlates with the engagement rate (e.g. reactions in the form of comments or social media shares). Nevertheless, the proportion of headlines tested is quite low overall at 29% and most tests (79%) only include two alternatives. Cleveland suspects one reason for this rather rudimentary testing is that the NYT earns less than a third of its sales from advertising (and falling). A front page full of clickbait headlines is likely to turn off potential subscribers, who make up nearly two-thirds of the company’s revenue.

Do headline tests even work?

As Alex Birkett (former Growth Manager at HubSpot) in his As articles on cxl.com correctly points out, a headline usually has a very limited lifespan – especially in high-frequency media outlets such as the NYT. For example, an A/B comparison that needs to run for four weeks to reach a reasonable level of confidence isn’t really useful. So, strictly speaking, the NYT example is not an A/B test. Rather, it’s about finding out as quickly as possible which title generates the strongest response. However, we run the risk of succumbing to the so-called “confirmation bias” because on the one hand we cannot control external variables (especially segmented target groups) and on the other hand we take more than one metric into account (clicks, shares, etc.).

Confirmation bias is the tendency to select, identify, and interpret information in a way that confirms one’s expectations.

Testing headlines via advertisements is definitely the better alternative, but it involves additional costs and time.

The premier class is certainly the so-called “Multi-armed Bandit tests”, but I refer to this article in order not to go beyond the scope here.

The bottom line is: Headline tests are possible, but the time f window is extremely small and it requires an enormous amount of traffic on the individual (!) articles/pages in order to achieve meaningful results.

How aggressively can I advertise products within my items? Is advertising tolerated at all?

Many companies want to boost their sales through content – in the long term and perhaps more indirectly, but often also directly and in the short term. Depending on the goal you are pursuing with a blog article, for example, your “advertisement” will look different: If you are primarily concerned with branding, then your focus is more on a sympathetic story. If, on the other hand, you want to generate leads or otherwise convert visitors, then you need effective calls to action.

Blinkist has also dealt with this concern with how aggressively – or let’s call it self-confident – they can promote their product and when they first point it out within an article. Your conclusion: Don’t be afraid to talk about your product. Be proud of it!

Another example of such “Copy Testing” is CXL, which for the last year has pitched on its home page through qualitative Optimized user feedback. What used to come across as elitist and arrogant (on the left in the picture) is then understood as helpful information about the necessary workload (middle). And a look at today’s version (on the far right in the picture below) shows that it is now fully geared towards social proof leave a testimonial.

Testing pitches, introductory text, and offer descriptions can reveal valuable information.

How can I make my content more interactive?

Is there anyone who doesn’t ask themselves this question regularly? But how do we actually define “interaction”? For many, scrolling is already an interaction, others understand it rather than commenting on a post or sharing via social media. This was also the case for a WiderFunnel customer (anonymized), who was able to achieve clearly positive results in a test with a view to the offer and the design (including positioning) of share buttons:

Pre-formulated tweets, sticky social sharing icons, and mouse-over effects – if you want to generate clicks, you should try what triggers users most.

Which email subject line is the best?

The subject line is the first and, next to the short description, often the only thing that users see in an email. It is therefore the most important basis for deciding whether users open an e-mail or not. Does a personal address via name tokens make sense? How do emojis work? Are questions effective? How long should/may a subject line be?

James Scherer, in his role as Content Editor at Wishpond at the time, asked himself exactly these questions with his team and compiled their experiences in an article: The Highest-Impact Email A/B Tests We’ve Ever Run. The results are sometimes surprising, I find the following upper and lower-case text particularly exciting! The Hypothesis: If we remove capital letters in the subject lines, then the email will feel more personal and make it appear as if a real person wrote it and clicked send too quickly.

A simple test, but in the case of Wishpond with an extremely positive effect.

Emails were originally very personal. What if we imitate this trait?

Newsletter tools have long offered such features and I would actually always test the subject lines of newsletters to understand what influences the open rate over the long term.

Which cover photo works better?

Better in what respect? Attract attention? trigger emotions? Pictures should also always have a function, a specific goal.

As you may have noticed, I no longer have cover pictures in my articles. Images for link previews in social media posts, yes, but no longer as the “default” at the beginning of an article. How so? Because they simply didn’t add any concrete value at this point and basically kept the users away from the actual content (I remember the long-form content tests from above).

In this case, no cover photo means that readers get to the content faster – or see more content at first glance – and can therefore benefit more quickly from my articles.

A possible metric, in this case, would be “Time to Value”. I didn’t test it (unfortunately), but since we haven’t observed a downcast in the numbers since then, I consider this change a success.

How quickly can users get added value from our content?

Do users need a summary, table of contents, and/or jump labels?

The thought behind it: H do such elements help users to navigate through the content, make it easier to understand, and get to the information they are looking for faster? For example, if you observe high bounce rates or low scroll rates and consequently suboptimal conversion rates (or generally a low “engagement rate”, as it is called in the new Google Analytics 4), such content elements can increase the usability of your articles and for that ensure that users find the information they are looking for (easier/faster). WiderFunnel also went in a similar direction with their tests for The Motley Fool (see point 4) and with the publication of this article a corresponding test is also running here.

Jump marks in the table of contents can make orientation and navigation easier for users.

How can I improve the CTR on search results pages?

Crazy that we’re not just talking about rankings, isn’t it? “Snippet optimization” (keyword: title tag, meta description but also schema markup) is often a real quick win to increase the value of existing rankings. Because what we ultimately want – in addition to pure visibility in search engines – are clicks.

The SearchPilot team (an “SEO A/ B-Testing Software”) has experimented with this in a local context and integrated the name of a brand – admittedly very well known, but anonymous in the example – into the page title – with a positive result on the click rate, more precisely 15% uplift.

If your brand strength is influential, putting it first in the title tag for users to see could tempt more of them to click through to the page – just because your name has more visibility. We believe this test performed as well as it did because displaying a powerful brand in the search results helped make these pages more noticeable to the user resulting in improved click-through rates.

Edward and his team are also testing the CTR in search results for the impact of emojis in meta titles, the FAQ dropdown scheme, the use of “power words”, the alignment of ext riffs on the insights from the Search Console, and the use of parentheses – all aspects from a HubSpot study on data-driven texting of headlines.

However, the SearchPilot example also shows how important a good hypothesis is and how difficult it can be to interpret the results of a test without performing additional tests to in turn validate these findings. If we really want to be sure, we must never stop testing.

As you can see, we can add countless questions to this list. That’s exactly what I meant at the beginning: As content marketers, we have so many questions, but we rarely bother to find real (objective) answers. The potential is enormous! Just imagine if you could add up the uplifts from all these examples…

Findings from Content Testing

Structure your texts so that users can scan them and they are easy to understand. As long as it represents added value, the scope of the content does not matter.

Don’t be afraid to promote your product or service. Especially in corporate blogs, every reader does the math in order to. The decisive factor is how you pack your “pitch”: natively, i.e. “editorially appealing”, or visually high-contrast “in your face” – both can
Build new tests on the insights of previous tests. No matter what the outcome of a long-form vs. short-form content test is, it can set the stage for further testing in that direction.

Use the findings from your tests to look at quality criteria for future content and your content design e.g. to define article templates. This allows you to brief (external) creatives and authors better and achieve better and better results over time.

An outlook: Will we see more content testing in the future?

I think the most important learning I’ve found in content testing so far is that UX/UI almost always comes second to content. Best practices, modern layouts, and stylistic design won’t trump content, and in many cases can actually detract from the efficacy of good content.

The design doesn’t work without content. But content doesn’t work without design either. Unfortunately, there are no blueprints for this, rather the design is an iterative, experimental process.

There is nothing wrong with orienting oneself to design standards and following conventions, but, as the name suggests, the user decides about the user experience. And they vary from case to case, so effective content is actually custom designed.