While once considered nice-to-have features for an application database, analytics now represent a competitive advantage and drives better customer experiences across enterprise and consumer applications. Developers across every industry are increasingly embedding powerful insights directly into their applications. Weâve observed the rise of data-intensive apps that require both greater access to disparate data sources and sub-second response times.
Most new applications today are powered by Postgres and itâs no surprise. In the 2024 Stack Overflow Developer Survey, Postgres ranked #1 for the most desired and #1 for the most admired database on earth of the 51,992 responses. Postgres, for all intents and purposes, is the application database of 2024 powering 10s of thousands of apps. While Postgres has many strengths, itâs analytics limitations canât reasonably serve the new generation of data-intensive apps that place emphasis on analytics and interactivity.
There are many old and some new ways to tweak analytics performance on Postgres; most quick fixes donât work well and scale poorly when requirements shift and complexity grows. Modern developers should have access to a state-of-the-art analytics engine within their application database.
â
A simple, low-cost, open source, ducky architecture
Weâre happy to announce pg_duckdb, an open source (MIT licensed) project that embeds DuckDBâs columnar-vectorized analytics engine and features into Postgres. Built in collaboration with the DuckDB Foundation and MotherDuck, pg_duckdb is for developing high performance applications and analytics.
pg_duckdb adds to Postgres:
- Access to broader dataset in Object Storage
- State-of-the-art analytics performance
- DuckDB features, such as COPY TO parquet format
â
DuckDB abilities meet Postgres
If youâd like to read the full list of features, check out the pg_duckdb README
â
Getting Started
The best way to get started is to connect Postgres to a new or existing object storage bucket (AWS S3, Cloudflare R2, or Google GCS) with pg_duckdb. You can query data in Parquet, CSV, and Iceberg format using read_parquet, read_csv, and iceberg_scan respectively.
1. Add a credential to enable DuckDB's httpfs support.
INSERT INTO duckdb.secrets(cloud_type, cloud_id, cloud_secret, cloud_region)
VALUES ('S3', 'access_key_id', 'secret_accss_key', 'us-east-1');
2. Copy data directly to your bucket - no ETL pipeline!
COPY (SELECT user_id, item_id, price, purchased_at FROM purchases)
TO 's3://your-bucket/purchases.parquet;
3. Perform analytics on your data.
SELECT SUM(price) AS total, item_id
FROM read_parquet('s3://your-bucket/purchases.parquet')
AS (price float, item_id int)
GROUP BY item_id
ORDER BY total DESC
LIMIT 100;
ââ
â
Open Source Supporters
Weâve been overwhelmed by positive comments and support from the open source community. After announcing pg_duckdb at DuckCon on Thursday, our GitHub repo trended over the weekend!
Our team would like to offer a few special words of gratitude to the following supporters, many of whom are providing financial backing to the development of pg_duckdb:
Ovais Tariq of Tigris Data, the Globally Distributed S3-Compatible Object Store
Tom Hacohen of Svix, Webhooks as a Service
Owain Brennan of SeerBI, Data Science / ML lab
As well as the teams from Infisical, Mintlify, Dagster, ElectricSQL, Nextgres, GoldSky, Drizzle, SuperTokens, TableFlow, QuestDB, PostgresML, NeoSync, Firecrawl, Positional and more.
If youâd like to support the open source development of pg_duckdb, please head over to hydra.so and click the âBecome a Supporterâ button to learn more.
â
Looking Forward: MotherDuck
The pg_duckdb extension will be fully capable of querying against data stored in the cloud in MotherDuck as if it were local. MotherDuck's "dual execution" capabilities let us join local Postgres data against MotherDuck data seamlessly, and we will figure out the best place to run the query. As a user, you don't really need to care where the computation runs, we'll just figure out how to make it run fast.
Moreover, it is common in analytics to want to offload your data from your transactional database into an analytical store. The pg_duckdb extension along with MotherDuck can help; you can just run a query in Postgres that pulls recent data from your Postgres database and write it to MotherDuck. You don't need to export and reimport data, or set up CDC.
Analytics can be resource hungry in terms of the amount of memory and CPU needed to make it run well. Depending on use-case requirements, teams may prefer using MotherDuck to help offload analytics to the cloud. Ideally, developers won't even have to change the queries that they're running; they just get faster.
Head over to MotherDuckâs post âSplicing Duck and Elephant DNAâ to learn more about the coming MotherDuck integration.
â