StructVector | Specialized JSON Embeddings

Text models struggle with JSON

When you embed a JSON object with OpenAI or Cohere, it gets stringified. The model sees a soup of brackets and quotes, often confusing keys with values or losing depth.

Flattened Hierarchy

{"a": {"b": "value"}} looks the same as {"a_b": "value"} to a tokenizer. Context is lost.

Type Blindness

Standard models treat the number 104 and the string "104" identically. Our model knows the difference.

Token Bloat

Formatting characters like { } " : waste up to 40% of your context window. We embed the structure directly.

Structural Positional Encoding

Instead of a simple linear sequence, StructVector uses a custom GNN-Transformer hybrid architecture.

1

Key-Value Attention:
Keys act as "prompts" for their values, creating a semantic bond before embedding.
2

Tree-Aware Encoding:
Depth and Sibling relationships are encoded as embedding biases, preserving hierarchy.
3

Schema Agnostic:
Works on dynamic JSON. No need to pre-define schemas or flatten your NoSQL data.

ARCHITECTURE_VIEW

Root Object

Key: "User"

Val: Object

Key: "ID"

Val: 1234

Pooling Layer

Output: vector[1536]

Drop-in replacement for OpenAI

Compatible with standard vector databases (Pinecone, Weaviate, Qdrant).

import structvector

client = structvector.Client(api_key="sv_...")

data = [
    {"product_id": 101, "specs": {"color": "red", "weight_kg": 1.5}},
    {"product_id": 102, "specs": {"color": "blue", "weight_kg": 1.5}}
]

# Embed structured data directly
embeddings = client.embed(
    inputs=data,
    model="json-v2-base"
)

print(embeddings[0].vector) # [0.021, -0.192, ...]

Benchmarks

JSON Retrieval Accuracy (NDCG@10)

Task: Retrieving documents based on nested field queries on a MongoDB dump.

StructVector v2 94.2%

OpenAI text-embedding-3-large 81.5%

Cohere English v3 79.8%

Ideal Use Cases

eCommerce Search: Matching products by nested specs (size, material) not just descriptions.
Log Analysis: Finding anomaly logs based on structural patterns rather than error messages.
RAG on NoSQL: Retrieving relevant user profiles or config objects for LLM contexts.

Simple, Volume-Based Pricing

Charged per million structured tokens (keys + values).

Developer

$0 / mo

• 1M tokens free / month
• 3 concurrent requests
• Community Support

Start Free

Popular

Production

$0.10 / 1M tokens

• Unlimited tokens
• 100 concurrent requests
• Priority Support
• 99.9% SLA

Get API Key

Enterprise

Custom

• VPC / On-prem deployment
• Custom fine-tuning
• Dedicated account manager

Contact Sales