Day 18/21 Days of AI for Marketers

Day 18: A/B Test Hypothesis Engine

The Concept

Most marketing teams that run A/B tests are not testing too little — they are testing the wrong things. They test button colours because button colours are easy to change. They test subject line capitalisation because it requires no design work. They run tests that, even if they win, will move their conversion rate by a fraction of a percentage point. Meanwhile the hypotheses that could actually matter — the ones about whether the value proposition is specific enough, whether the social proof appears at the wrong moment, whether the form is asking for more than the offer justifies — never get tested because no one took the time to generate and prioritise them systematically.

A/B testing is not a creative exercise. It is a hypothesis-management process. The quality of your tests is entirely determined by the quality of your hypotheses, and hypothesis generation is one of the most underinvested activities in conversion optimisation.

The gap between intuition and evidence

Every marketer has intuitions about what is underperforming on their landing pages and emails. The headline feels generic. The CTA is buried. The page is too long. These intuitions are often directionally correct but rarely precise enough to be testable. "The headline feels generic" is not a hypothesis — it is an observation. "Replacing the category-level headline with one that names the specific problem our audience is trying to solve will increase conversion because specificity activates recognition" is a hypothesis, because it identifies the element, the change, and the mechanism.

AI is useful here because it can apply conversion psychology frameworks — specificity, loss aversion, social proof, effort reduction, reciprocity — to your specific copy systematically, generating hypotheses you might not have considered and articulating the mechanism behind each one. This turns an intuition-based testing approach into a structured programme.

Prioritisation is the real skill

Generating hypotheses is easy once you have the right prompt. Prioritising them is where the strategic judgment lives. The two dimensions that matter most are expected impact and implementation effort — and they rarely correlate. The highest-impact hypothesis is often not the easiest to implement, and the easiest tests to run are often the ones least likely to move the needle significantly.

The prompt today produces a ranked list that surfaces both dimensions explicitly, so you can make an informed decision about where to start rather than defaulting to whatever is fastest to implement. The recommended approach is to start with a high-impact, moderate-effort hypothesis — not the most dramatic test, but one that is worth the build time and that will produce a meaningful result.

One test at a time

The most common A/B testing mistake is running too many tests simultaneously. When multiple elements change at once, you cannot isolate which change drove the result — and a result you cannot explain is a result you cannot replicate. The hypotheses in today's output are designed to be run sequentially, one per testing cycle, so that each result builds genuine knowledge about what your specific audience responds to.

Your testing roadmap starts today

The ten hypotheses you receive are not a backlog to work through over the next year. They are a ranked starting point. The top three are your next quarter's testing programme. Each test you run — and the result it produces — becomes context for the next round of hypothesis generation. Over time, you build a body of knowledge about your audience's psychology that no competitor can replicate, because it comes from your specific data about your specific customers.

This is what separates companies with a testing culture from companies that occasionally run tests. The culture is not about running more tests. It is about learning faster from each one.

Prompt of the day

Copy this into your AI tool and replace any bracketed placeholders.

Prompt

You are a conversion rate optimisation specialist with deep experience running A/B tests on landing pages, email campaigns, and paid ads. Your job is not to write copy — it is to generate testable hypotheses about why performance might improve if specific elements were changed, and to prioritise those hypotheses by expected impact.
What I am testing: [choose one — landing page / email campaign / ad creative / email subject lines] The URL or paste the current copy here: [paste the full text of your landing page, email, or ad — or provide the URL if it is publicly accessible] My current performance baseline: [e.g. landing page converts at 2.3% from paid traffic; email open rate 24%, click rate 1.8%] My primary conversion goal: [e.g. demo booking / email sign-up / purchase / click-through to product page] My audience: [e.g. HR directors at companies with 200–2000 employees, evaluating people management software for the first time] One thing I already suspect is underperforming: [e.g. I think the hero headline is too generic and does not speak to the audience's specific situation]
Generate 10 A/B test hypotheses. For each hypothesis:
- State the element being tested (headline, CTA button, social proof placement, form length, etc.) - Write the current state in one sentence - Write the proposed variant in one sentence - Explain the conversion psychology principle behind why the variant might outperform (loss aversion, specificity, social proof, effort reduction, etc.) - Estimate the relative impact potential: High / Medium / Low — with one sentence of reasoning - Estimate the implementation effort: Easy / Moderate / Complex
Present the 10 hypotheses ranked from highest expected impact to lowest. Flag the top three as your recommended starting point for a testing roadmap.

Your 15-minute task

Pick one asset you are actively running right now — a landing page getting traffic, an email sequence that is live, or an ad campaign spending money today. Paste the full copy into the prompt (or the URL if public-facing). Fill in your real conversion baseline — even a rough number is fine. Run the prompt. Read the top three hypotheses and ask yourself honestly: have you already tested any of these? If the answer is no, you have your next testing roadmap. Put the first test into your sprint backlog before you close this tab.

Expected win

Ten prioritised A/B test hypotheses for a specific live asset — each with the element being tested, the proposed variant, the conversion psychology behind it, an impact rating, and an implementation effort estimate — ranked so your next test is already decided.

Power user tip

After reviewing the hypotheses, send this follow-up: 'For hypothesis number [X] — write me the two variants I would actually test: the control copy as it currently stands, and the challenger copy with the proposed change applied. Keep everything else identical. Format it so I can hand it directly to a designer or developer.' You go from hypothesis to production-ready test brief in one extra prompt.

Previous day

Next day