Choo-choo!
Please take a ride on TrainDB!

Toward Data-Less Big Data Analytics




About TrainDB Project


TrainDB is a ML-based approximate query processing engine that aims to answer time-consuming analytical queries in a few seconds.

TrainDB will provide SQL-like query interface and support various DBMS data sources.

TrainDB is an open source project, mainly contributed by ETRI, RealTimeTech, BI Matrix, BigDyL in KAIST (moved from Yonsei University) and BigComLab in Kwangwoon University.


Various Types of
Approximate Queries

  • 10+ analytic/aggregate operations support
  • Complex aggregate queries support (including groups and joins)
  • Approximate query support for tabular/spatiotemporal data
  • Query processing with a given error-bound or response time
  • What-if queries on non-existing data with adjusted data distribution

Convenient User Environment

  • Data-driven ML model approach which can be trained without query history
  • Various DBMS data sources support
  • Approximate query answering in cloud/portable environments even with lost connection to DBMS
  • Incremental query processing for interactive user experience

Powerful Tools and
Application Services

  • Visualization tools for exploratory data analysis
  • Target tables/attributes recommendation for automated ML model training
  • ML model migration support
  • Application services for demonstration in various business fields