How it works¶
OpaqueDB matches a query against stored rows without decrypting either. This page explains the matching algorithm, the data model it runs over, and the FHE parameters that make it fit.
The data model¶
A schema is one CREATE TABLE name (col TYPE [KEY], ...). Types are INT,
REAL, and TEXT. Exactly one column is the match KEY (INT or TEXT, not
REAL).
- The match key maps into a
2^key_bitsuniverse.INTmaps directly (range-checked).TEXThashes through a deterministic FNV hash, so aTEXTmatch is a candidate: collisions are possible but unlikely, and a largerkey_bitslowers the rate. Client-side verification ofTEXTmatches is a tracked TODO. - Payload columns are returned, not matched. They are packed at ingest (int and real take 8 bytes, text takes a 2-byte length plus bytes) and decoded on the client.
The matching algorithm¶
Matching is bit-sliced equality over SIMD slots. The match key is binary
expanded into key_bits slots per record, and many records are packed into the
slots of a single ciphertext. The query value is bit-expanded and tiled across
all slots, so the query travels as one ciphertext.
For each batch ciphertext the server computes, in parallel across the whole batch:
- Per-bit difference.
diff = query - key(asub_plain), thensq = diff * diff(asquare). This gives(q - k)^2, which is 0 where bits match and 1 where they differ. - Per-bit XNOR.
eq = 1 - sq, the per-bit equality indicator. - AND across each record's bits. A rotate-and-multiply tree of depth
log2(key_bits)reduces the per-bit indicators into one equality indicator per record. - Block masking and broadcast. A plaintext mask blocks record starts, then a doubling masked broadcast spreads the indicator. This uses no plaintext multiply, so it spends no noise and causes no cross-block bleed.
- Retrieve. For each payload plane, a
multiply_plainby the packed payload, then a block-sum into block 0. Only the matching record's payload survives.
The two SIMD slot rows are not merged with rotate_columns. The matched payload
lands in block 0 of one row, and the client sums the two row-0 blocks.
Why this is cheap¶
The whole scan costs only 1 + log2(key_bits) ciphertext-times-ciphertext
multiplies per batch (the dominant FHE cost), regardless of how many records the
batch holds. At key_bits = 16 that is multiplicative depth 5. The expensive
multiplies amortize across thousands of rows.
Empty results¶
A no-match must look the same as a match to the operator. The backend appends a
"presence" ciphertext, sum_r b_r, the match count. On decrypt the client reads
this first and returns nothing at zero. A no-match is therefore an encrypted
empty result, and the operator never learns whether a query matched.
Combining shards¶
In a cluster each shard evaluates its own partial. CombinePartials adds the
shard partials plane-wise. For this to be correct the data must be sharded
disjoint (consistent hash); replicating the full set would double-count.
FHE parameters¶
The parameters live once in config and are read by the crypto layer. Defaults:
| Parameter | Default | Notes |
|---|---|---|
poly_modulus_degree |
16384 | Required. The depth-5 pipeline exhausts the budget at 8192. |
coeff_modulus_bits |
[60, 60, 60, 60, 60, 49] |
349 bits, under the 438-bit degree-16384 limit. Leaves a measured ~52-bit budget. |
plain_modulus_bits |
20 | |
key_bits |
16 | Key universe is 2^key_bits. Raising it deepens the AND tree and needs more primes. |
Tune against examples/crypto_bench, which times the SEAL primitives the
matcher uses (rotation, square, ciphertext multiply) at these parameters. The
reference backend test asserts a positive noise budget after the full pipeline.
Keys¶
The client generates its own keys and registers the public material. Galois keys are generated, but only the rotation steps the matcher actually uses (power-of-two AND and broadcast steps plus the block-sum, roughly 17 keys).
At poly 16384 the reduced Galois set is about 125 MB, which is too big for one
gRPC message. The Register RPC streams it in 4 MB chunks, and the coordinator
forwards it to peers over a streaming RegisterKeys RPC. The relin keys are
about 7.5 MB and the public key about 1.5 MB. The client never sends its secret
key.
Reference¶
The private matching builds on the homomorphic-encryption techniques described in:
Karaçay, L., Savaş, E., and Alptekin, H. (2020). Intrusion Detection Over Encrypted Network Data. The Computer Journal, 63(4), 604-619. doi:10.1093/comjnl/bxz111