Donghyun Sohn 손동현

Donghyun Sohn

PhD Candidate · Northwestern University

Donghyun Sohn

Database Systems & Privacy-Preserving Computation

I'm Donghyun Sohn, a Ph.D. candidate in Computer Science at Northwestern University, advised by Professor Jennie Rogers.

My research focuses on building secure and efficient data processing systems at the intersection of databases and cryptography — developing practical query engines that use MPC and FHE to run sensitive queries efficiently without compromising security.

More broadly, I'm interested in database performance debugging and regression testing, as well as the bidirectional relationship between AI and database systems — both applying LLMs to make databases more accessible, and using database techniques to make AI systems more efficient.

Apr 2026 ScanTwin paper accepted at SeQureDB '26 (co-located with SIGMOD '26)
Mar 2026 ScanTwin paper submitted to SeQureDB '26 (co-located with SIGMOD '26)
Nov 2025 Program Committee Member, SIGMOD Availability & Reproducibility Initiative 2025
Fall 2025 Teaching Assistant, CS 339: Intro to Database Systems, Northwestern
May 2025 Paper accepted at VLDB 2025 — Alchemy: A Query Optimization Framework for Oblivious SQL
01 · CRYPTOGRAPHY × DATABASE SYSTEMS
Privacy-Preserving Query Execution
Query optimization and systems engineering for secure SQL execution using cryptographic protocols without ever exposing raw data.
VLDB 2025
Alchemy: Oblivious Query Optimization
Protocol-agnostic optimization framework combining circuit-aware cost modeling with classical query optimization for MPC-based SQL execution. Achieves up to 100× speedup across multiple cryptographic protocols.
MPCCost ModelQuery Optimization
In Submission
HAMMER: FHE Analytical Query Engine
OpenFHE-based columnar engine for encrypted OLAP workloads. Optimized via SIMD batching, multi-threading, and a hybrid FHE-to-MPC conversion layer for practical encrypted analytics.
FHEOpenFHEOLAPSW/HW Co-design
02 · DATABASE PERFORMANCE & PRIVACY
Privacy-Compliant Regression Testing
Differential privacy frameworks for OLAP performance regression — generating synthetic workloads that preserve statistical fidelity without exposing sensitive data.
SeQureDB (SIGMOD 2026 Workshop)
ScanTwin: Differential Privacy for OLAP Regression
Framework for generating DP-compliant synthetic datasets from Parquet footers for scan operator benchmarking. Extracts per-row-group sketches without accessing raw data.
Differential PrivacyParquetDuckDB
In Progress
Extending ScanTwin to Full Operator Coverage
Generalizing the ScanTwin framework beyond scan operators to joins, aggregations, and filters — enabling end-to-end DP-safe performance regression across analytical query pipelines.
Query OperatorsBenchmarkingPrivacy
03 · AI × DATABASES
Schema-Aware Natural Language Interfaces
Investigating how LLMs can be grounded in structured schema knowledge to generate accurate SQL — targeting the gap between natural language intent and database-specific structure.
In Progress
HSTG: Hierarchical Schema-Topology Graph for NL2SQL
Investigating LLM-based NL2SQL translation with a focus on schema incompleteness as a structural bottleneck. Evaluating retrieval-augmented approaches on the BIRD benchmark.
NL2SQLLLMBIRDRAG

Publications

Alchemy: A Query Optimization Framework for Oblivious SQL
Donghyun Sohn, Kelly Jiang, Nicolas Hammer, Jennie Rogers
Proceedings of the VLDB Endowment, 2025.

Everything You Always Wanted to Know About Secure and Private Database Systems (but were Afraid to Ask)
Donghyun Sohn, Xiling Li, Jennie Rogers
Data Engineering Bulletin 2023