DuckDB explained

DuckDB
Developer:DuckDB Labs
Latest Release Version:v1.1.3
Programming Language:C++
Operating System:Cross-platform
Genre:Column-oriented DBMS
RDBMS
License:MIT License

DuckDB is an open-source column-oriented relational database management system (RDBMS).[1] It is designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases (for example, SQLite) DuckDB is not focusing on transactional (OLTP) applications and instead is specialized for online analytical processing (OLAP) workloads.[2] The project has over 6 million downloads per month.[3] [4] [5]

History

DuckDB was originally developed by Mark Raasveldt and Hannes Mühleisen at the Centrum Wiskunde & Informatica (CWI) in the Netherlands.[6] The project co-founders designed DuckDB to address the need for an in-process OLAP database solution.[7] DuckDB was first released in 2019.[8] DuckDB version 1.0.0 was released on June 3, 2024 under the codename SnowDuck. [9]

Features

DuckDB uses a vectorized query processing engine.[10] DuckDB is special amongst database management systems because it does not have any external dependencies and can build with just a C++11 compiler.[11] DuckDB also deviates from the traditional client–server model by running inside a host process (it has bindings, for example, for a Python interpreter with the ability to directly place data into NumPy arrays[6]). DuckDB's SQL parser is derived from the pg_query library developed by Lukas Fittl, which is itself derived from PostgreSQL's SQL parser that has been stripped down as much as possible. [12] [13] DuckDB uses a single-file storage format to store data ondisk, designed to support efficient scansand bulk updates, appends and deletes. [14]

Comparison

DuckDB in its OLAP niche does not compete with the traditional DBMS like MSSQL, PostgreSQL and Oracle database. While using SQL for queries, DuckDB targets serverless applications and provides extremely fast responses using Apache Parquet files for storage. These attributes make it a popular choice for large dataset analysis in interactive mode, but certain commenters have indicated that they believe the serverless nature of DuckDB makes it, as a stand alone tool, "not so suitable for enterprise data warehousing".[15]

Commercial use

DuckDB is used at Facebook, Google, and Airbnb.[16]

DuckDB co-author Mühleisen also runs a support and consultancy firm for the software, DuckDB Labs.[8] The company has chosen not to take venture capital funding, stating "We feel investment would force the project direction towards monetization, and we would much prefer keeping DuckDB open and available for as many people as possible".[5] Another company, MotherDuck, has received $100m funding for its data platform based on DuckDB, with investors including Andreessen Horowitz.[17]

DuckDB Foundation

The independent non-profit DuckDB Foundation safeguards the long-term maintenance and development of DuckDB. The foundation holds much of the intellectual property of the project and is funded by charitable donations.[18] The DuckDB Foundation's statutes ensure DuckDB remains open-source under the MIT license in perpetuity.[19]

Language support

In addition to the native C and C++ APIs, DuckDB supports a range of programming languages.

Client APIs
Language Notes Reference
The Java API is implemented using JNI.[20] Integration with the Apache Arrow[21] format is provided. [22]
The Python API implements support for the Pandas,[23] Apache Arrow[24] and Polars data analysis packages. [25]
The Rust API is distributed as a rust crate that exposes an elegant wrapper over the native C API. [26]
Node API [27]
R API [28]
Julia API [29]
Swift API [30]

Further reading

External links

Notes and References

  1. Web site: DuckDB Documentation SQL Introduction . 2024-11-20 .
  2. Raasveldt . Mark . Mühleisen . Hannes . DuckDB: an Embeddable Analytical Database . ACM . 2019-06-25 . 978-1-4503-5643-5 . 10.1145/3299869.3320212 . 1981–1984.
  3. Web site: PyPi Download Stats . 2024-08-13 . www.pypistats.org . en . 2024-08-13 . https://web.archive.org/web/20240813165631/https://pypistats.org/packages/duckdb . live .
  4. Web site: DuckDB Python Downloads Dashboard . 2024-08-13 . duckdbstats.com . en . 2024-08-13 . https://web.archive.org/web/20240813165159/https://duckdbstats.com/ . live .
  5. Web site: Clark . Lindsay . DuckDB Labs puts limit on free support, rules out VC funding . 2024-03-23 . www.theregister.com . en . 2024-03-23 . https://web.archive.org/web/20240323064605/https://www.theregister.com/2023/10/05/duckdb_labs_puts_limit_on_vc_funds/ . live .
  6. Book: Kamphuis, Chris . Advances in Information Retrieval . Graph Databases for Information Retrieval . Lecture Notes in Computer Science . Springer International Publishing . Cham . 12036 . 2020 . 978-3-030-45441-8 . 7148032 . 10.1007/978-3-030-45442-5_79 . 608–612.
  7. van der Ent . Leendert . April 2023 . DuckDB: Introducing a New Class of Data Management Systems . I/O Magazine . ICT Research Platform Nederland . 12 November 2024.
  8. Web site: Clark . Lindsay . DuckDB reaches version 0.5.0 . 2024-03-23 . www.theregister.com . en . 2024-03-07 . https://web.archive.org/web/20240307163220/https://www.theregister.com/2022/09/09/duckdb_0_5_0/ . live .
  9. Web site: Raasveldt . Mark . Mühleisen . Hannes . Announcing DuckDB 1.0.0 . 3 June 2024 . 12 November 2024.
  10. Raasveldt . Mark . Mühleisen . Hannes . DuckDB: an Embeddable Analytical Database . ACM . 2019-06-25 . 978-1-4503-5643-5 . 10.1145/3299869.3320212 . 1981–1984.
  11. Web site: DuckDB Building Instructions . 2024-08-16 .
  12. Raasveldt . Mark . Mühleisen . Hannes . DuckDB: an Embeddable Analytical Database . ACM . 2019-06-25 . 978-1-4503-5643-5 . 10.1145/3299869.3320212 . 1981–1984.
  13. Web site: Slot . Marco . How We Fused DuckDB into Postgres with Crunchy Bridge for Analytics . 24 May 2024 . 12 November 2024.
  14. Raasveldt . Mark . Mühleisen . Hannes . Data Management for Data Science Towards Embedded Analytics. Conference on Innovative Data Systems Research . 2020 .
  15. Book: Bannert, M. . Research Software Engineering: A Guide to the Open Source Ecosystem . CRC Press . Chapman & Hall/CRC Data Science Series . 2024 . 978-1-04-000513-2 . 2024-03-23 . 25 . 2024-03-23 . https://web.archive.org/web/20240323010627/https://books.google.com/books?id=yWL7EAAAQBAJ&pg=PT25 . live .
  16. Web site: Clark . Lindsay . Scale-up database wrangler MotherDuck scores $47.5 million . 2024-03-23 . www.theregister.com . en . 2024-03-23 . https://web.archive.org/web/20240323064604/https://www.theregister.com/2022/11/17/475_million_says_scaleup_databases/ . live .
  17. Web site: Clark . Lindsay . MotherDuck serverless analytics platform wins $52.5M funding . 2024-03-23 . www.theregister.com . en . 2024-03-23 . https://web.archive.org/web/20240323064604/https://www.theregister.com/2023/09/21/motherduck_funding/ . live .
  18. Web site: DuckDB Foundation . 2024-11-09 .
  19. Web site: DuckDB Project FAQs . 2024-11-09 .
  20. Web site: Java JNI Source Code . 2024-09-07 . www.github.com . en.
  21. Web site: DuckDB Java Arrow Source Code . www.github.com . 2024-09-07.
  22. Web site: DuckDB Java Source Code . 2024-09-07 . www.github.com . en .
  23. Web site: DuckDB Pandas Source . 2024-09-07 . www.github.com . en .
  24. Web site: DuckDB PyArrow Source . 2024-09-07 . www.github.com . en .
  25. Web site: DuckDB Python Source Code . 2024-09-07 . www.github.com . en .
  26. Web site: DuckDB Rust Source Code . 2024-09-07 . www.github.com . en .
  27. Web site: DuckDB Node Source Code. 2024-09-07 . www.github.com . en .
  28. Web site: DuckDB R Source Code . 2024-09-07 . www.github.com . en .
  29. Web site: DuckDB Jullia Source Code . 2024-09-07 . www.github.com . en .
  30. Web site: DuckDB Swift Source Code . www.github.com . 2024-09-07.