Skip to content

Services · Conversion Rate Optimization

Conversion Rate Optimization Agency — Hypothesis Discipline, Not A/B Test Theater.

Quantitative data, qualitative insights, and statistically clean validation — as a coherent system, not push-button optimization. Calvarius works as a CRO agency for e-commerce, B2B lead generation, and SaaS trial sign-ups, with clear research discipline before every action.

Research before testing/Statistical discipline/Economically grounded

Standpoint

Research discipline before action.

Anyone looking for a conversion rate optimization agency has typically made a critical observation: the market is full of providers selling A/B tests without understanding the discipline behind them. "Let's see if green converts better than red" isn't CRO — it's gambling with ad budget. A test without a hypothesis is a test without learning effect. A test without sufficient sample becomes a random result. A test without economic grounding optimizes for vanity metrics instead of contribution margin. Anyone taking CRO seriously works differently.

Calvarius treats conversion rate optimization as a research discipline with action sequence — not as a push-button test workshop. That means: before every test stands a hypothesis. Before every hypothesis stands research. Before the research stands a clear funnel diagnosis. We know where in the funnel the economically relevant conversion losses happen — and prioritize where the economic leverage is greatest. A 5% improvement on a product detail page with high margin is worth more than a 30% improvement on a newsletter signup.

The methodological triad we work with isn't negotiable: quantitative funnel analysis shows where problems are. Qualitative insights via heatmaps, session recordings, and customer research show why problems arise. A/B tests with correct statistics validate what works. Anyone leaving out one of these three levels doesn't practice CRO — but gut feeling with a test coating. We use all three systematically and in the right order.

Concretely, that means: we work with platform tools like Microsoft Clarity, Hotjar, VWO, and Optimizely — combined with GA4 and BigQuery export for a clean data foundation. We test across the entire customer lifecycle, not just on individual landing pages — ad click, landing page, product detail, cart, checkout, post-purchase, retention. And we choose the methodology appropriate to volume: For SME setups with 200–1,000 monthly conversions, we work primarily qualitatively — customer research, heuristic analysis, targeted tests for large expected effects. For medium volume with Bayesian statistics and sequential testing. For high volume with classic A/B testing setup. Anyone selling SMEs enterprise methodology burns their budget.

For Shopify stores, we always look at conversion in interplay with the technical base — theme performance, checkout customization, and app selection co-determine how much a test can actually lift. More on our Shopify agency page.

A test without a hypothesis is gambling with ad budget. We don't test blindly.

Typical failures

Where CRO systematically fails in practice.

FAILURE 01

"Let's just test something"

Without a hypothesis and without a research foundation, a test is worthless — even if it delivers a result. What was learned? Why did it work? Can the result be transferred to other areas? Tests without hypothesis are gambling with ad budget. We start every test with a documented hypothesis: "We suspect that X happens because Y, measured via Z."

FAILURE 02

Sample size isn't calculated

A test with too small a sample is statistically worthless. "Variant B is 12% better after 100 conversions" isn't a result, it's chance. A serious CRO practice calculates the necessary sample size before every test based on baseline conversion rate, minimum detectable effect, and desired confidence level (typically 95%). Tests without this calculation are theater.

FAILURE 03

Optimizing for micro KPIs instead of profitability

Click-through rate on a banner increased by 30% — and revenue stays the same. Add-to-cart rate improved — but checkout conversion worsened. Optimizing on individual metrics without funnel validation leads to shifts, not profitability. We always measure on the end-conversion value, not on intermediate steps.

FAILURE 04

Test pipeline without prioritization

"Let's test the headline first. Then the image. Then the button." Without prioritization by economic leverage, 80% of test capacity is wasted on 20% of impact potential. We prioritize with ICE or PIE frameworks: Impact, Confidence, Ease — weighted by mandate reality.

Our methodology

Quantitative, qualitative, statistical — as a coherent system.

Quantitative

The WHERE

Funnel analysis via GA4 and BigQuery. Heatmaps and click tracking via Microsoft Clarity or Hotjar. Cohort analysis for return behavior. Drop-off analyses per funnel stage. Output: concrete hotspots where money is being lost economically.

Qualitative

The WHY

Session recordings of selected user sessions. Customer surveys at strategic funnel points. User tests with small sample. Heuristic analysis according to Nielsen, Krug, or other established UX frameworks. Output: hypotheses why the hotspots exist.

Statistical

The WHAT

Methodology appropriate to volume: A/B tests with correctly calculated sample size where volume permits. Bayesian statistics at medium volume for more valid statements with less data. At small volume: heuristic-based validation plus documented best-practice implementation instead of unreliable tests. Output: volume-appropriate, validated improvements.

This triad isn't modular — all three layers are mandatory. Anyone testing without research optimizes blindly. Anyone only looking at heatmaps without validation optimizes anecdotally. Anyone analyzing only funnel data without qualitative depth sees the where, but not the why. Only the combination delivers systematic improvement.

Our approach

How a CRO mandate runs with us.

  1. 01
    Funnel Diagnosis (3–5 days)

    Quantitative analysis of the existing funnel. GA4 audit, tracking validation, drop-off identification per funnel stage. Output: prioritized hotspot list with economic impact potential.

  2. 02
    Research & Hypothesis Formation (1–2 weeks)

    Qualitative insights via heatmaps, session recordings, optional customer surveys. Heuristic analysis according to UX frameworks. At least three documented hypotheses per hotspot with concrete measurement definition.

  3. 03
    Test Pipeline Setup (1 week)

    ICE or PIE prioritization of hypotheses. Sample size calculation per test. Test setup in VWO, Optimizely, or comparable platform. Tracking validation before go-live.

  4. 04
    Test Execution (continuous, from week 3–4)

    Weekly test cycle with clear hypothesis pipeline. Statistical significance check before test ending. Sequential testing where appropriate for shorter test durations. Strictly no cherry-picking, no premature ending at "good interim states."

  5. 05
    Implementation & Scaling (parallel to Phase 04)

    Successful tests are rolled out to production. Insights flow into subsequent hypothesis formation. Building a documented "lessons learned" database per mandate.

  6. 06
    Customer Lifecycle Extension (from month 3)

    Tests are extended across the entire funnel — ad click, landing page, product detail, cart, checkout, post-purchase, retention. CRO isn't a "landing page only" game, but customer journey optimization.

This pipeline doesn't run linearly but cyclically — every test insight flows back into hypothesis formation. Economic leverage grows with every iteration because knowledge about the specific mandate audience grows.

What we deliver

Capabilities across the CRO spectrum.

Funnel Diagnosis & Hotspot Mapping
GA4, BigQuery, drop-off analysis
Heuristic & Heatmap Analysis
Nielsen, Krug, Clarity, Hotjar
A/B & Multivariate Testing
Sample size, Bayesian, sequential
Customer Lifecycle Testing
Ad click to retention
UX Research & Customer Insights
Surveys, user tests, VoC
Tracking Hygiene as Prerequisite
Server-side, Enhanced Conversions
Economic Validation
End-conversion value, contribution margin
Test Documentation & Lessons Learned
Knowledge accumulation

Tooling

What we manage CRO operationally with.

Quantitative data foundation

  • Google Analytics 4 with BigQuery export
  • Server-side tracking via GTM Server
  • Funnel visualization in Looker Studio

Qualitative insights

  • Microsoft Clarity (free, GDPR-compliant)
  • Hotjar (for deeper heatmap and survey features)
  • Custom survey tools depending on requirements

A/B and multivariate testing

  • VWO — main platform for most mandates
  • Optimizely — where enterprise features or server-side testing are needed
  • Convert.com as mid-market alternative
  • Own implementations via feature flags where platform tools aren't enough

Statistical tools

  • Sample size calculator (internal, Calvarius-owned spreadsheets)
  • Bayesian statistics toolkit for more complex analyses
  • Sequential testing frameworks for shorter test durations

What's realistic

Volume realism — the right methodology for your setup.

CRO isn't just classic A/B testing. Most Calvarius mandates are SMEs from the e-commerce sector with monthly budgets of €10,000–100,000 — and thus setups where enterprise A/B test methodology isn't economically viable. We work with the methodology that fits the volume class — not the one that sounds most impressive in conference talks.

CLASS 1 — SMALL VOLUME

200–1,000 monthly conversions

CALVARIUS FOCUS

This is where most SME setups we manage are located. We work primarily with qualitative methods: customer research, user tests with small samples, heuristic analysis according to UX frameworks, heatmap diagnosis via Microsoft Clarity or Hotjar. We deploy A/B tests selectively where we expect substantial effects (15–20% or more) and where test periods of 4–8 weeks are practical. Realistic improvements: 8–25% within 6–9 months — primarily through clean research, best-practice implementation, and targeted tests, not through test-pipeline volume.

CLASS 2 — MEDIUM VOLUME

1,000–5,000 monthly conversions

COMMON CALVARIUS AREA

Here classic A/B testing becomes economically viable — but with adjustments compared to enterprise methodology. We use Bayesian statistics instead of frequentist (gets to valid statements with less data), sequential testing for shorter test durations, clear focus on large expected effects instead of fine optimization. Realistic improvements: 10–30% within 6 months, with 2–4 parallel running tests.

CLASS 3 — VERY SMALL VOLUME

Under 200 monthly conversions

Here classic A/B tests aren't statistically valid — not even with Bayesian statistics. We then work exclusively qualitatively: heuristic analysis, customer surveys, user tests, best-practice refactoring based on documented industry standards. We say this honestly before the mandate — not that we reject mandates, but that we adapt the methodology to reality. Measurable effects in 4–8 weeks.

CLASS 4 — HIGH VOLUME

5,000+ monthly conversions

Enterprise area where classic A/B testing with 95% confidence level is fully scalable. Multivariate tests possible. Test durations of 2–4 weeks per test. Test pipeline with 3–5 parallel running tests. Calvarius also manages mandates of this size — but it isn't the focus of our mandate mix, but another volume class we cover with methodological rigor.

In the Calvarius mandate reality, most setups fall into Class 1 or 2 — SME setups with B2C e-commerce focus, where choosing the right CRO methodology is economically decisive. Anyone selling SMEs enterprise methodology burns their budget. Anyone offering SMEs nothing because volume isn't sufficient for classic testing leaves economic leverage on the table. We do neither — we deliver the methodology that fits the setup.

What you gain

What you get from our CRO discipline.

Research substance before every action. We don't test blindly — every CRO action starts with a documented hypothesis based on quantitative and qualitative research.

Statistical discipline as mandatory component. Sample size calculation, correct confidence levels, no cherry-picking, no premature ending. Real scientific rigor, not test theater.

Economic grounding instead of micro KPI optimization. We measure on end-conversion value and contribution margin, not on click-through rate or add-to-cart rate.

Customer lifecycle view instead of landing page tunnel. CRO across the entire funnel, because impact arises where friction sits — and friction rarely sits only on a landing page.

Honest volume realism. We tell you before the mandate whether CRO is economically viable at your traffic volume. If not: other levers instead of CRO.

Knowledge accumulation per mandate. Test documentation and lessons-learned database — insights from earlier tests flow into later hypotheses.

REFERENCES

Companies we work with

Our work is rarely loud, but measurable. A selection of companies we have supported in recent years:

Logo of ATP Autoteile
Logo of Bluecode
Logo of Casimum
Logo of Diaeko
Logo of Eye-Able
Logo of Frostfutter Perleberg
Logo of Happy Cheeze
Logo of iGO
Logo of Liebesgut
Logo of Mondi
Logo of Mücke
Logo of Procani
Logo of Schlafstil
Logo of Spessarttraum
Logo of Vantastic Foods
Logo of Velivery
Logo of VR Immoservice

Engagements range from operational execution to strategic sparring and coordination of external partners.

Get concrete

CRO audit in 30 minutes — free and non-binding.

In a first conversation, we clarify whether your setup is economically CRO-viable — i.e., whether traffic volume is sufficient for valid tests and where the most impactful hotspots are. You get an honest assessment, even if that means CRO isn't the right lever right now.