Frequently asked questions
The essentials on how Semantara works, how it saves you money, and how it protects your keys and your data.
What is Semantara and what problem does it solve?
Semantara is an intelligent proxy that sits between your app and AI providers (OpenAI, Anthropic, and soon Gemini). It adds semantic caching, complexity-based routing, and spend control so you pay less for the same answers — without changing your code.
What does BYOK ("bring your own key") mean?
You use your own provider keys. Semantara never resells tokens or marks up your usage: you keep your billing relationship with OpenAI/Anthropic, and we only charge for the platform subscription.
How does semantic caching save me money?
When two questions mean the same thing even if worded differently, Semantara reuses the answer it already generated instead of calling (and paying) the model again. In real workloads, 30-40% of queries are semantically similar, which typically means 40-70% lower cost on those queries.
How is this different from OpenAI's or Anthropic's caching?
Providers cache identical prefixes (the exact same text). Semantara caches by meaning: "how do I reset my password?" and "I forgot my login, what now?" share one answer. And Semantara stacks on top of provider caching — it doesn't replace it.
What is complexity-based routing?
Semantara classifies each request and sends it to the right model: simple ones to a cheaper, faster model; complex ones to the powerful model. You pay for the premium model only when it's actually needed.
Do I have to change my code to integrate it?
No. Semantara exposes an OpenAI-compatible endpoint (/v1/chat/completions). Change the base URL (and optionally the model) and you're done; the rest of your integration stays the same.
Which providers are supported?
Today OpenAI and Anthropic, with Gemini on the way. Being OpenAI-compatible, you can unify several providers behind a single integration.
Are my keys and data secure?
Yes. Provider keys are encrypted with AES-256-GCM, every client is isolated in a multi-tenant environment, and we never use your data to train models. See the Security page for details.
How much can I really save?
It depends on how many queries repeat and on your model mix. Use the Savings Calculator to estimate your case; typical savings on repeated queries range from 40% to 70%.
Can I control my spend?
Yes. You track your usage and savings in real time from the dashboard —cost is computed on every request, per client and per key— around our core metric: dollars saved per month. And you set per-key rate limits to curb unexpected usage.
How do plans and billing work?
There's a Free, Pro, and Business plan with fixed pricing, plus custom Enterprise. Because it's BYOK, the subscription covers the platform; model usage is billed directly by your providers. See the Plans page.
Where is it hosted? Is there an on-premise option?
The managed version runs on cloud infrastructure. For enterprise needs with sensitive data, contact us: the Enterprise plan supports custom deployments.