Data Team's Mission at PostHog
Data Team's mission is to provide a storage and query engine that meets these requirements:
- Continue to meet the needs of the product today now and in the future
- Maintain and optimize our current ClickHouse deployment
- Elastically scale our capacity with little effort
- Support multiple query quality of service (QOS) guarantees (Real-time, Batch, etc.)
- Data is stored once and queryable from the appropriate tool
- Queries are optimized for cost and performance
- Tunable execution performance to allow trade-offs between cost and performance
- Storage is durable
In service of this mission, our goals for Q4 are:
Goals for Q4:
- Improve elasticity and flexibility of our data store by putting all our data in Iceberg
- Work with Altinity effectively to ship read path for Iceberg on ClickHouse - Brett Hoerner
- Setup infrastructure to ship all of our data to Iceberg on S3 - James Greenhill
- Shipping Query logs to Iceberg - James Greenhill
- Continue improving CH operational expertise
- Upgrade to a later version of Clickhouse - James Greenhill
- Capacity planning - James Greenhill
- Automation - Daniel Escribano
- Put some of the basic mitigation operations in runbooks - Daniel Escribano
- Schema management
- Tool for schema migration (coordinator schemas) - Daniel Escribano
- Tool for long running mutations - Daniel Escribano
- Continued investment in performance
- Tooling for other teams to understand which queries are slow and why - Ted Kaemming
- Investigate variability of queries - Ted Kaemming
- Per-team limits on queries/query complexity (needs product work) - Ted Kaemming