Standard embeddings flatten your JSON into a string, losing semantic hierarchy. Our Tree-Aware Transformer encodes structure, types, and nesting for superior retrieval.
{
"user_id": 4829,
"attributes": {
"role": "admin",
"access_level": 5,
"department": "engineering"
},
"is_active": true
}
Structure Preserved ↑
When you embed a JSON object with OpenAI or Cohere, it gets stringified. The model sees a soup of brackets and quotes, often confusing keys with values or losing depth.
{"a": {"b": "value"}} looks the same as {"a_b": "value"} to a tokenizer. Context is lost.
Standard models treat the number 104 and the string "104" identically. Our model knows the difference.
Formatting characters like { } " : waste up to 40% of your context window. We embed the structure directly.
Instead of a simple linear sequence, StructVector uses a custom GNN-Transformer hybrid architecture.
Keys act as "prompts" for their values, creating a semantic bond before embedding.
Depth and Sibling relationships are encoded as embedding biases, preserving hierarchy.
Works on dynamic JSON. No need to pre-define schemas or flatten your NoSQL data.
Compatible with standard vector databases (Pinecone, Weaviate, Qdrant).
import structvector client = structvector.Client(api_key="sv_...") data = [ {"product_id": 101, "specs": {"color": "red", "weight_kg": 1.5}}, {"product_id": 102, "specs": {"color": "blue", "weight_kg": 1.5}} ] # Embed structured data directly embeddings = client.embed( inputs=data, model="json-v2-base" ) print(embeddings[0].vector) # [0.021, -0.192, ...]
Task: Retrieving documents based on nested field queries on a MongoDB dump.
Charged per million structured tokens (keys + values).