Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I only use CH for work so I'll read about this more on Monday but I shudder to think of the caveats. We have used cancelling rows and now the one of the merge engines that just needs a version (higher cancels out lower). No database has ever driven me more mad than Clickhouse. If your workload is append-only/insert-only then congrats, it's amazing, you'll have a great time. If you need to update data... Well, strap in.

As long as you can get away with Postgres, stay with Postgres. I'm sure this update here is a step forward just like version-merging is much better than cancelling rows but it's always got a ton of downsides.

Unrelated to updating data, the CH defaults drive me insane, the null join behavior alone made me reconsider trying to rip CH out of our infrastructure (after wasting too long trying to figure out why my query "wasn't working").

Lastly I'll say, if CH does what you need and you are comfortable learning all the ends and outs, then it can do some really cool things. But it's important to remember it's NOT a normal RDMS nor can you use it like one. I almost wish they didn't use SQL as the query language, then people would think about it differently, myself included.





Very interesting take — I see where you’re coming from. Yes, there are caveats and differences between ClickHouse and Postgres. Much of this stems from the nature of the workloads they are built for: Postgres for OLTP and ClickHouse for OLAP.

We’ve been doing our best to address and clarify these differences, whether through product features like this one or by publishing content to educate users. For example: https://clickhouse.com/blog/postgres-to-clickhouse-data-mode... https://www.youtube.com/watch?v=9ipwqfuBEbc.

From what we’ve observed, the learning curve typically ranges from a few weeks for smaller to medium migrations to 1–2 months for larger ones moving real-time OLAP workloads from Postgres to ClickHouse. Still, customers are making the switch and finding value — hundreds (or more) are using both technologies together to scale their real-time applications: Postgres for low-latency, high-throughput transactions and ClickHouse for blazing-fast (100x faster) analytics.

We’re actively working to bridge the gap between the two systems, with features like faster UPDATEs, enhanced JOINs and more. That’s why I’m not sure your comment is fully generalizable — the differences largely stem from the distinct workloads they support, and we’re making steady progress in narrowing that gap.

- Sai from the ClickHouse team here.


How much of the ISO/IEC 9075:2023 SQL standard does CH conform to?

What would be the best Postgres + CH setup to combine both? somethign using CDC and apply them to CH?

Great question, exactly CDC from Postgres to ClickHouse and adapting the application to start using ClickHouse for analytics. Through the PeerDB acquisition, ClickHouse now has native CDC capabilities that work at any scale (few 10s of GB to 10s of TB Postgres databases). You can use ClickPipes if you’re on ClickHouse Cloud, or PeerDB if you’re using ClickHouse OSS.

Sharing a few links for reference: https://clickhouse.com/docs/integrations/clickpipes/postgres https://github.com/PeerDB-io/peerdb https://clickhouse.com/cloud/clickpipes/postgres-cdc-connect... https://clickhouse.com/blog/clickhouse-acquires-peerdb-to-bo...

Here is a short demo/talk that we did at our annual conferemce Open House that talks about this reference architecture https://clickhouse.com/videos/postgres-and-clickhouse-the-de...


Funny, I had the exact same frustration, also with nulls and a left join. I did end up ripping it out and doing it over again with Timescale (ugh okay Tiger Data). The ability to use Postgres normal things plus timeseries columar storage is really cool. I don't have big data though, just big enough where some tables got slow enough to worry about such things and not big enough to stomach basic sql not working.

We've been using ClickHouse ReplacingMergeTree tables for updates without any issues...in fact, they've been more than reliable for our use case. For us, as long as updated data is visible within 15–30 minutes, that's acceptable. What's your ingest vs. update volume per hour and per minute?

There's also the new CoalescingMergeTree, that seems very useful for many classic roll-up problems, ideal for materializing a recent view of the append only log of data that is ClickHouse's natural append-only log strong point. https://clickhouse.com/blog/clickhouse-25-6-coalescingmerget... https://news.ycombinator.com/item?id=44656436

For general mutable data, ClickHouse is trying super hard to get much better & doing amazing engineering. But it feels like it'll be a long time before the fortress of Postgres for OLTP is breached. https://about.gitlab.com/blog/two-sizes-fit-most-postgresql-... https://news.ycombinator.com/item?id=44895954

The top submission is the end of a 4 part series. Part two is really nice on the details of how ClickHouse has focused on speeding updates: recommend a read! https://clickhouse.com/blog/updates-in-clickhouse-2-sql-styl...


I agree, I’ve been on CH since v20 and I thought I was the only one who noticed that they’ve been working very hard to bridge the gap between OLAP and OLTP. Sure, they’ll always be first class OLAP DB…but if you know how to get dangerous with its strengths, making it the goto datalake for your existing OLTP is pretty freaking awesome. Thanks for those shares

CH is better for analytics, where append only is the normal mode of operation, but I've used it in the past as an index. Store a copy of data in Clickhouse and use its vectorized columnar operations for ad hoc queries (the kind where indexes don't help because the user may query by any field they like). This can work well if your data is append-mostly and you do a rebuild yourself avter a while, but the way it sounds, Clickhouse is making it possible to get that to work well with a higher ratio of updates.

Either way, CH shouldn't be the store of truth when you need record level fidelity.


You might want to think about converting your updates into some sort of event sourcing scheme where you insert new rows and then do aggregation. That pattern would be more appropriate for ClickHouse.

If you are needing updates then perhaps ClickHouse isn't the perfect choice. Something like ScyllaDB might be a better compromise if you want performant updates with (some sort of) consistency guarantees. If you need stronger guarantees you will need a "proper" database but then you're unlikely to get the performance. AKA tradeoffs or no free lunch.


I ported an entire analytic solution from SQL Server to clickhouse in a few months. While the workarounds for updates aren't great it didn't come as a surprise since I've used other similar databases. The joining/null behavior is called out in the documentation so that wasn't a surprise either.

CH has been my favorite database since I discovered PostgreSQL 20 years ago. My view point is don't use postgres unless you can't use CH.


I can recommend Vertica: SQL, columnar storage, S3 backed, great extensibility, I could keep going. After several years of working with it, I can say it's my favorite OLAP DB that can be as fast as a transactional DB when handled correctly.

Confusingly they have a Community License <https://docs.vertica.com/24.4.x/en/getting-started/community...> but their actual things in GH carry an Apache 2 license <https://github.com/vertica/vertica-containers/blob/main/one-...> so I guess you're free to contribute to their getting-started files, but it's a binary license for the product

What is the null join behavior that cause you problem?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: