v2.0 Model Released

Don't treat data
like flat text.

Standard embeddings flatten your JSON into a string, losing semantic hierarchy. Our Tree-Aware Transformer encodes structure, types, and nesting for superior retrieval.

input_data.json
{
  "user_id": 4829,
  "attributes": {
    "role": "admin",
    "access_level": 5,
    "department": "engineering"
  },
  "is_active": true
}
Standard Embedding StructVector

Structure Preserved ↑

Text models struggle with JSON

When you embed a JSON object with OpenAI or Cohere, it gets stringified. The model sees a soup of brackets and quotes, often confusing keys with values or losing depth.

Flattened Hierarchy

{"a": {"b": "value"}} looks the same as {"a_b": "value"} to a tokenizer. Context is lost.

Type Blindness

Standard models treat the number 104 and the string "104" identically. Our model knows the difference.

Token Bloat

Formatting characters like { } " : waste up to 40% of your context window. We embed the structure directly.

Structural Positional Encoding

Instead of a simple linear sequence, StructVector uses a custom GNN-Transformer hybrid architecture.

  • 1
    Key-Value Attention:

    Keys act as "prompts" for their values, creating a semantic bond before embedding.

  • 2
    Tree-Aware Encoding:

    Depth and Sibling relationships are encoded as embedding biases, preserving hierarchy.

  • 3
    Schema Agnostic:

    Works on dynamic JSON. No need to pre-define schemas or flatten your NoSQL data.

ARCHITECTURE_VIEW
Root Object
Key: "User"
Val: Object
Key: "ID"
Val: 1234
Pooling Layer
Output: vector[1536]

Drop-in replacement for OpenAI

Compatible with standard vector databases (Pinecone, Weaviate, Qdrant).

import structvector

client = structvector.Client(api_key="sv_...")

data = [
    {"product_id": 101, "specs": {"color": "red", "weight_kg": 1.5}},
    {"product_id": 102, "specs": {"color": "blue", "weight_kg": 1.5}}
]

# Embed structured data directly
embeddings = client.embed(
    inputs=data,
    model="json-v2-base"
)

print(embeddings[0].vector) # [0.021, -0.192, ...]

Benchmarks

JSON Retrieval Accuracy (NDCG@10)

Task: Retrieving documents based on nested field queries on a MongoDB dump.

StructVector v2 94.2%
OpenAI text-embedding-3-large 81.5%
Cohere English v3 79.8%

Ideal Use Cases

  • eCommerce Search: Matching products by nested specs (size, material) not just descriptions.
  • Log Analysis: Finding anomaly logs based on structural patterns rather than error messages.
  • RAG on NoSQL: Retrieving relevant user profiles or config objects for LLM contexts.

Simple, Volume-Based Pricing

Charged per million structured tokens (keys + values).

Developer

$0 / mo
  • • 1M tokens free / month
  • • 3 concurrent requests
  • • Community Support
Start Free
Popular

Production

$0.10 / 1M tokens
  • • Unlimited tokens
  • • 100 concurrent requests
  • • Priority Support
  • • 99.9% SLA
Get API Key

Enterprise

Custom
  • • VPC / On-prem deployment
  • • Custom fine-tuning
  • • Dedicated account manager
Contact Sales