Architecture & Benchmarks

See how ChatAds works - and how it ranks with alternatives

This page outlines why ChatAds is faster and more reliable than internal POCs or LLMs, while providing benchmarks to compare against real data.

Build vs Buy

How teams try to build AI chat monetization themselves — and where each stack breaks

The quick POC is spaCy text extraction and basic keyword/BM25 matching. Production builds use LLMs and vector retrieval tools. Then there is ChatAds, which does both extraction and resolution.

Input

AI-generated response

Since you've got AirPods, a better workout pick is the Powerbeats Pro. You can usually find them at Best Buy for around $200.

POC stack spaCy + keyword/BM25
50mslatency $0.02/ 1k

Cheap and fast enough for a demo. Breaks on ownership, stores, bare brands, accessories, and model drift.

Likely output: links AirPods, Best Buy, or another noisy surface term.
DIY production stack LLM extractor + vector retrieval
1s - 2slatency $0.15 - $0.75/ 1k

Better semantic coverage, but requires another LLM call. Still needs custom validators for wrong brands, accessories, and bad matches.

Likely chooses 'Powerbeats Pro', but costly and slows down the AI response to the user.
ChatAds Extracted keyword + resolved offer
~100 mslatency $0.02/ 1k

Runs extraction and resolution as one commerce-specific pipeline. Returns a tracked offer, or nothing when the match is bad.

Output: chooses 'Powerbeats Pro' with matching link, fast enough to insert into the AI response.
Time to market

Build vs buy: how fast can this safely ship?

A prototype is quick. A production-safe commerce layer is not. The gap is validators, resolution quality, refusal behavior, tracking, and ongoing evals.

Path Time to market What ships Main risk
POC build 1-2 weeks Prompt, parser, or keyword/vector lookup against one catalog. Looks convincing on curated demos. Breaks on ownership, stores, accessories, comparisons, and ambiguous product mentions.
Production-ready internal build 3-6 months Extraction logic, catalog resolution, validators, revenue ranking, tracking, rate limits, observability, and evals. LLM call slows down inline response, and you're spending countless hours tackling linguistic edge cases while users complain about bad offers.
Robust commercial product 6+ months Dedicated ML pipeline, large edge-case corpus, catalog quality controls, customer controls, billing, dashboards, docs, SDKs, and ongoing eval ops. Internal and customized - but 6+ months of engineering opportunity cost.
Or, ChatAds

Time to market: 1-2 days

Integrate the API and get the production commerce layer without building extraction, resolution, validation, and tracking from scratch.

  • Validated product extraction from generated AI text
  • Catalog resolution with rule-based refusal for irrelevant matches
  • Revenue-aware offer selection and tracked URLs
  • No extra LLM call in the response path
  • API keys, usage tracking, rate limits, and billing controls
Architecture

How ChatAds actually works

End-to-end live request path: two binary monetizable classifiers, intent & entity extraction, catalog resolution with quality filters, rule-based validators, and revenue-optimized selection — all under 100ms, no LLM in the hot path.

AI

Your platform

AI application / chatbot

AI generates a response to the user.

1

Call ChatAds

{
  "response_id": "abc123",
  "conversation_id": "xyz789",
  "response_text": "Here are
some great noise-cancelling
headphones for travel..."
}

API response

< 100ms

Response with eCommerce link inserted, or original text if no fit.
"Here are some great
noise-cancelling headphones
for travel: [Sony WH-1000XM5]
(eCommerce link) ..."
End-to-end latency: < 100ms p50
2
F

Monetizable binary classifiers

Two independent models decide whether to continue. Fast fail when the response is not monetizable.

3
E

Intent & entity extraction

spaCy pipeline with contextual enrichment, intent identification, blocklists, brand matching, and span resolution.

4
DB

Catalog resolution & quality filters

Local CPU database search, LRU cache, semantic similarity matching, then filters for stars, reviews, in-stock, and price.

5

Rule-based product result validators

Title similarity, accessory catches, vertical mismatch, brand mismatch, demographic mismatch, and brand-vs-generic comparison.

6
$

Revenue optimization

Expected value per click using commission rate, conversion rate, price, brand strength, CTR, stock, ratings, and review volume.

7

Select best keyword & resolve URL

Return the highest expected-value result with the best anchor text and resolved eCommerce URL, or correctly refuse.

Our approach

Why an LLM is the wrong tool for monetizing AI conversations

Calling another LLM to extract products from AI text is the obvious first instinct — and the wrong one. Here's how a deterministic ML pipeline compares to an LLM extraction call across the dimensions that matter for production commerce.

Dimension ChatAds (ML pipeline) LLM extraction
Latency <100ms total. Stable p99. 800ms-2s typical. p99 spikes to 5s+ during peak load on shared APIs. Variance kills inline use.
Cost* Fractions of a cent per call. Predictable. Best models are expensive, old ones hallucinate, and prices are rising.
Accuracy Pulls directly from text. Catalog-grounded. Extensive linguistic validation. LLMs hallucinate, and semantic search struggles with intent.
Determinism Same input → same output. Testable, A/B-able, debuggable. Outputs drift run-to-run, and LLM updates can break workflows.
Uptime* Your infrastructure with self-hosted ChatAds. OpenAI and Anthropic can have outages and latency issues.
Data privacy* No LLM-vendor data sharing. AI conversations don't leave your stack. Every call ships your users' AI conversations to a third-party model vendor.

* Uptime, costs, and data-privacy advantages assume self-hosted or VPC deployment of ChatAds. On the hosted ChatAds API, those concerns would still apply. Self-host removes that boundary entirely.

10 cases

Extraction benchmarks — who extracts the best keywords?

Pick a case. See each method side-by-side. Each case is a real AI-generated reply. The detail panel shows what spaCy, gpt-5.4-nano, gpt-5.4-mini, and ChatAds actually returned.

Messages without products

Plenty of AI replies are pure advice — no products mentioned. Returning an offer anyway is ad spam on a non-shopping moment.

AI reply

Strength training comes down to consistency more than equipment. Three sessions a week with progressive overload will outperform an expensive home gym used twice a month.

Method Extracted products Pick / offer Latency
spaCy noun-chunks Strength trainingconsistencyequipmentThree sessionsa weekprogressive overloadan expensive home gymtwice a month Just extracts phrases — doesn't pick a winner 11.8ms
gpt-5.4-nano home gym home gym
Hallucinated offer
902.3ms
gpt-5.4-mini none none (correct) 1820.4ms
ChatAds none none (correct) 18.4ms
Naive baseline takeaway: Naive LLMs hallucinate a "home gym" or "dumbbell set" offer for an advice-only reply. — Ad spam on advice content.
Hallucinated products

LLM extractors fill in canonical answers from training data even when the reply names no specific product — inventing SKUs the AI never said.

AI reply

For someone just getting into espresso without spending too much, the standard recommendation has held up for years — small footprint, easy to use, surprisingly capable for the price.

Method Extracted products Pick / offer Latency
spaCy noun-chunks someoneespressothe standard recommendationyearssmall footprintthe price Just extracts phrases — doesn't pick a winner 13.2ms
gpt-5.4-nano DeLonghi Stilosa DeLonghi Stilosa
Hallucinated SKU
1042.7ms
gpt-5.4-mini Breville Bambino Plus Breville Bambino Plus
Hallucinated SKU
1934.0ms
ChatAds espresso machine espresso machine 19.2ms
Naive baseline takeaway: Naive LLMs invent "Breville Bambino Plus" or "DeLonghi Stilosa" — fabricated SKUs the AI didn't actually mention. — Fabricated SKU.
Multiple products → one pick

AI replies often list three options and clearly highlight one. Naive extractors return all three with equal weight, splitting the offer across competing products.

AI reply

You've got three solid blender options at this price: the Ninja Foodi is durable, the NutriBullet Pro is compact, and the Vitamix E310 is the long-haul investment — that's the one I'd actually pick if you can stretch the budget.

Method Extracted products Pick / offer Latency
spaCy noun-chunks three solid blender optionsthis pricethe Ninja Foodithe NutriBullet Prothe Vitamix E310the long-haul investmentthe onethe budget Just extracts phrases — doesn't pick a winner 18.4ms
gpt-5.4-nano Ninja FoodiNutriBullet ProVitamix E310 Ninja Foodi
Picked first, not the highlighted recommendation
798.6ms
gpt-5.4-mini Ninja FoodiNutriBullet ProVitamix E310 Vitamix E310 1654.2ms
ChatAds Vitamix E310Ninja FoodiNutriBullet Pro Vitamix E310 21.7ms
Naive baseline takeaway: Returns all three blenders without ranking. Splits the recommendation across products that compete with each other. — No intent ranking.
Owned / in-use suppression

Replies often acknowledge what the user is already running ("since you're using the X…") before recommending Y. Naive extractors return both and monetize the user's own device.

AI reply

Since you're already running an Anker MagSafe charger, the Apple 20W adapter pairs perfectly — you'll get full speed without buying anything else for the cable.

Method Extracted products Pick / offer Latency
spaCy noun-chunks an Anker MagSafe chargerthe Apple 20W adapterfull speedanythingthe cable Just extracts phrases — doesn't pick a winner 9.7ms
gpt-5.4-nano Anker MagSafe chargerApple 20W adapter Anker MagSafe charger
Owned product linked
821.4ms
gpt-5.4-mini Anker MagSafe chargerApple 20W adapter Anker MagSafe charger
Owned product linked
1547.3ms
ChatAds Apple 20W adapter Apple 20W adapter 18.9ms
Naive baseline takeaway: Returns both — including the Anker charger the user is already using. — Owned product linked.
Bare brand mentions

Brands appear in non-shopping contexts — ecosystem comparisons, news, opinion. Naive extractors monetize the brand name with no actual product attached.

AI reply

Apple's tight ecosystem is great if you're already on Mac and iPhone, but it locks you in. Sony and Bose offer better cross-platform pairing.

Method Extracted products Pick / offer Latency
spaCy noun-chunks Apple's tight ecosystemMaciPhoneSonyBosebetter cross-platform pairing Just extracts phrases — doesn't pick a winner 12.4ms
gpt-5.4-nano AppleSonyBose Sony
Bare brand monetized
712.5ms
gpt-5.4-mini AppleSonyBose Apple
Bare brand monetized
1430.2ms
ChatAds none none (correct) 17.9ms
Naive baseline takeaway: Returns Apple, Sony, and Bose as products. There's no actual recommendation here — just a comparison of ecosystems. — Brand-as-topic monetized.
Brand & generic in same span

AI replies often name a branded product and then describe it generically in the same breath. Naive extractors return three or four variants — diluting the offer with subset and category duplicates.

AI reply

The Anker PowerCore 10000 is the standard answer here — a compact 10,000mAh power bank that fits in a pocket and charges most phones twice over.

Method Extracted products Pick / offer Latency
spaCy noun-chunks The Anker PowerCorethe standard answera compact 10,000mAh power banka pocketmost phones Just extracts phrases — doesn't pick a winner 14.1ms
gpt-5.4-nano Anker PowerCore 10000PowerCore10000mAh power bank Anker PowerCore 10000 879.4ms
gpt-5.4-mini Anker PowerCore 10000Anker PowerCorepower bank Anker PowerCore 10000 1612.0ms
ChatAds Anker PowerCore 10000 Anker PowerCore 10000 20.3ms
Naive baseline takeaway: Returns "Anker PowerCore 10000", "Anker PowerCore", "PowerCore", and "power bank" as separate products. — Subset duplication.
Comparison direction

When the AI says "upgrading from X to Y" or "Y is better than X", only Y should be linked. Naive extractors return both and monetize the device the user is replacing.

AI reply

If you're upgrading from your old MacBook Air to a more powerful machine for video editing, the Lenovo ThinkPad P14s with the Ryzen 7 chip is a strong pick.

Method Extracted products Pick / offer Latency
spaCy noun-chunks your old MacBook Aira more powerful machinevideo editingthe Lenovo ThinkPad P14sthe Ryzen 7 chipa strong pick Just extracts phrases — doesn't pick a winner 10.4ms
gpt-5.4-nano MacBook AirLenovo ThinkPad P14s MacBook Air
Comparison source linked
872.7ms
gpt-5.4-mini MacBook AirLenovo ThinkPad P14s MacBook Air
Comparison source linked
2790.1ms
ChatAds Lenovo ThinkPad P14s Lenovo ThinkPad P14s 22.1ms
Naive baseline takeaway: Treats both products as recommendations — links the device being replaced. — Comparison source linked.
Sensitive-context suppression

AI replies sometimes mention products alongside medical, illness, or other sensitive topics. Naive extractors monetize anyway. ChatAds suppresses to avoid affiliate spam in distressing contexts.

AI reply

For chemo recovery, a memory-foam wedge pillow can help with the nausea and post-treatment fatigue — elevating the upper body makes the rough nights more manageable.

Method Extracted products Pick / offer Latency
spaCy noun-chunks chemo recoverya memory-foam wedge pillowthe nauseapost-treatment fatiguethe upper bodythe rough nights Just extracts phrases — doesn't pick a winner 11.6ms
gpt-5.4-nano memory-foam wedge pillow memory-foam wedge pillow
Sensitive context monetized
768.3ms
gpt-5.4-mini memory-foam wedge pillow memory-foam wedge pillow
Sensitive context monetized
1380.7ms
ChatAds none none (correct) 19.5ms
Naive baseline takeaway: Monetizes "memory-foam wedge pillow" alongside chemotherapy context — affiliate spam in a sensitive moment. — Brand safety failure.
Not in catalog

AI replies often name real products that aren't in your affiliate catalog. Naive extractors return the name and dump the resolution failure on the caller — a downstream search returns no result, or worse, drifts to a no-name fallback. ChatAds checks the catalog inline and returns no offer when no high-confidence match exists.

AI reply

If you're getting into mechanical keyboards, the Topre Realforce R3 is the gold standard — heavy electrostatic-capacitive switches and a tactile feel you can't get from MX-style boards.

Method Extracted products Pick / offer Latency
spaCy noun-chunks mechanical keyboardsthe Topre Realforce R3the gold standardheavy electrostatic-capacitive switchesa tactile feelMX-style boards Just extracts phrases — doesn't pick a winner 12.3ms
gpt-5.4-nano Topre Realforce R3 Topre Realforce R3
No catalog check — caller gets a name, not a SKU
762.4ms
gpt-5.4-mini Topre Realforce R3 Topre Realforce R3
No catalog check — caller gets a name, not a SKU
1654.0ms
ChatAds Topre Realforce R3 none (correct) 19.8ms
Naive baseline takeaway: Extracts the brand+model correctly but leaves the caller to discover the SKU isn't in catalog. Downstream search returns nothing — or drifts to a no-name keyboard. — Resolution problem dumped on caller.
Generic-adjective bloat

Marketing adjectives ("high-quality", "premium", "professional-grade") aren't part of a product identity — they pad the phrase but match nothing in a real catalog. Naive extractors keep them, ChatAds strips them.

AI reply

For everyday cooking, a high-quality nonstick skillet handles most stovetop tasks — eggs, pancakes, sautéed veggies, and quick pan sauces.

Method Extracted products Pick / offer Latency
spaCy noun-chunks everyday cookinga high-quality nonstick skilletmost stovetop taskseggspancakessautéed veggiesquick pan sauces Just extracts phrases — doesn't pick a winner 12.0ms
gpt-5.4-nano high-quality nonstick skillet high-quality nonstick skillet
Marketing adjective retained
711.3ms
gpt-5.4-mini high-quality nonstick skillet high-quality nonstick skillet
Marketing adjective retained
1289.4ms
ChatAds nonstick skillet nonstick skillet 18.7ms
Naive baseline takeaway: Returns "high-quality nonstick skillet" — the marketing adjective inflates the phrase but is meaningless to a real catalog. — Adjective bloat retained.
7 resolution cases

Resolution benchmarks — who resolves the best offer?

Pick a failure mode. See all three methods. Even when extraction is correct, the wrong resolver produces unsafe links. ChatAds rows are real API output; keyword/BM25 and plain-vector rows are illustrative of the dominant failure mode for each approach.

Demographic drift

Extracted phrase: digital watch

Source AI reply

A simple digital watch with a long battery life and a backlight is all most people need for daily wear — nothing fancy required.

Method Returned product Verdict
Keyword / BM25 Kids Cartoon Digital Watch with Light-Up Face Wrong demographic
BM25 ranks by token overlap × review count. Kids watches dominate review counts in this category.
Plain vector top-1 Kids Cartoon Digital Watch with Light-Up Face Wrong demographic
Same review-count bias surfaces in the embedding manifold — high-review SKUs cluster nearby and outrank adult alternatives.
ChatAds digital watch Adult digital watch (kids SKU rejected)
Why this matters: Generic adult-watch queries land on kids' watches in most consumer catalogs because kids' SKUs accumulate higher review counts. ChatAds runs a demographic-mismatch validator that rejects kids/men's/women's matches when no demographic was specified.
Accessory not the device

Extracted phrase: Lenovo Yoga Slim 7

Source AI reply

If you're shopping for a new ultrabook for college, the Lenovo Yoga Slim 7 is hard to beat for the price — long battery life and a solid screen.

Method Returned product Verdict
Keyword / BM25 Yoga Slim 7 Sleeve Protective Case Wrong product type
All four query tokens appear in the title. Review count breaks the tie toward the case.
Plain vector top-1 Yoga Slim 7 Sleeve Protective Case Wrong product type
Sleeve and laptop sit close in the embedding manifold; review-count bias pushes the sleeve to top-1.
ChatAds no offer No offer
Accessory validator rejects the sleeve. No device SKU available, so no offer rather than a wrong link.
Why this matters: Cases, sleeves, replacement keyboards, and chargers outnumber the actual device SKU in most catalogs. Both lexical and semantic retrieval drift to whichever accessory has the most reviews. ChatAds validates that the resolved product is the device itself, not an accessory.
Brand drift

Extracted phrase: Dyson V8

Source AI reply

For a reliable cordless vacuum on a tight budget, the Dyson V8 holds up well even years in and the battery is plenty for most apartments.

Method Returned product Verdict
Keyword / BM25 INSE Cordless Stick Vacuum 6-in-1 Wrong brand
Token "vacuum" matches; "Dyson" outranked by review count. BM25 has no concept of brand identity.
Plain vector top-1 INSE Cordless Stick Vacuum 6-in-1 Wrong brand
Embedding similarity collapses brand signal. High-review no-name vacuum outranks the Dyson SKU.
ChatAds Dyson V8 Animal Cordless Vacuum Brand held
Why this matters: Plain retrieval ignores brand identity. BM25 returns whatever matches "Dyson" or "vacuum" by review count — often a different generation. Vector drifts further, surfacing high-review no-name vacuums that cluster near the Dyson SKU. ChatAds enforces brand fidelity: if the search term carries a brand, the resolved product must too — or it falls back to a sibling within the brand line.
Generic category collapse

Extracted phrase: chef's knife

Source AI reply

For most home cooks, a quality chef's knife in the eight-inch range is the single most valuable kitchen tool you can buy — it handles ninety percent of prep work.

Method Returned product Verdict
Keyword / BM25 8-Piece Knife Block Set with Sharpener Bundle, not a chef's knife
Token "chef's knife" appears in the bundle title. Review count promotes the bundle over single SKUs.
Plain vector top-1 Wüsthof 6-Piece Steak Knife Set Wrong knife type
Embedding clusters all "knife" SKUs together. Steak-knife sets often outrank single chef's knives by review volume.
ChatAds 8-inch chef's knife Single quality default
Why this matters: Unbranded category extractions are common ("a good chef's knife", "a basic tripod"). Naive retrieval picks the highest-ranked listing — often a multi-piece block set or a kids' practice knife, both of which match "chef's knife" by token. ChatAds runs a generic-prefix-mismatch validator that rejects titles where the query is a prefix of a longer phrase that names a different product.
Model number identity

Extracted phrase: Sony A7 IV

Source AI reply

For wildlife photography I'd recommend the Sony A7 IV paired with a 200-600mm telephoto — the autofocus tracking is exceptional and the burst rate handles fast-moving subjects.

Method Returned product Verdict
Keyword / BM25 Sony Alpha a6400 Mirrorless Camera Wrong model
Tokens "Sony" + "IV" (Roman numeral) are weak; review count surfaces the more popular a6400.
Plain vector top-1 Sony Alpha a7C Full-Frame Camera Wrong generation
Embedding collapses A7 variants. Closest cluster member by similarity isn't the IV.
ChatAds Sony Alpha 7 IV Mirrorless Camera Exact model
Why this matters: Model numbers (A7 IV, RT-AX86U, S24 Ultra) carry product identity. Lexical search tokenizes them as noise ("A7", "IV") and ranks by review count, often surfacing a different generation. Vector search treats alphanumeric tokens as low-signal and collapses across model variants. ChatAds preserves model-number tokens through embedding and matches them to the exact catalog SKU.
Context vertical mismatch

Extracted phrase: amber night light

Source AI reply

For the nursery, a soft amber night light helps reduce sleep disruption during night feedings — the warm color won't suppress melatonin like white light does.

Method Returned product Verdict
Keyword / BM25 VEKKIA Industrial LED Shop Light with Amber Mode Wrong vertical
Tokens "amber" + "light" match. Review count promotes the industrial fixture far above niche nursery lights.
Plain vector top-1 BLACK+DECKER Workshop LED Floodlight Wrong vertical
Embedding clusters all amber-emitting lights together. Higher-reviewed industrial SKUs outrank baby-vertical alternatives.
ChatAds amber night light Baby-context night light
Why this matters: ChatAds emits per-keyword vertical tags from the surrounding context (baby, pet, automotive, gardening, professional) using a ±15-token window around the extracted phrase. When a candidate carries a conflicting vertical tag, the resolution gate hard-rejects it. BM25 and plain vector retrieval have no concept of context vertical — they pick whatever matches the tokens or the embedding.
Line fidelity within a brand

Extracted phrase: MacBook Air

Source AI reply

For college, the MacBook Air is plenty — battery life is great and it handles writing, browsing, and Zoom without a fan kicking on.

Method Returned product Verdict
Keyword / BM25 MacBook Pro 14-inch with M3 Chip Wrong line
Token "MacBook" matches both Air and Pro. Review count promotes Pro variants over Air.
Plain vector top-1 MacBook Pro 14-inch with M3 Chip Wrong line
Embedding similarity treats Air and Pro as the same MacBook cluster. Higher-reviewed Pro outranks Air.
ChatAds MacBook Air M3 Air line preserved
Why this matters: Within a brand line, the differentiating token (Air vs Pro, Mini vs Max, SE vs Ultra) carries product identity. Plain retrieval ignores it: vector clustering treats Air and Pro as semantic neighbors, and BM25 with review-count secondary ranking surfaces the more popular Pro variant. ChatAds runs a line-fidelity gate (CHA-5486) that blocks candidates lacking the differentiating token.
Live demo

Test ChatAds using a demo fitness assistant.

Our AI assistant is fine-tuned on fitness responses and uses the Amazon catalog for product resolution.

Bring commerce to AI-generated text

Use ChatAds to detect product recommendations, resolve safe offers, and return tracked links before the response renders.