💊 DATA Pill #169 - Persona vectors, HA Postgres on K8s, streaming lakehouses

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

Published Aug 11, 2025

Hi,

This week: steer LLMs with persona vectors, see how Airbnb runs Postgres HA on Kubernetes, and learn to build a streaming lakehouse with Flink and Iceberg. Plus, tools from Google and Databricks and Flink’s biggest release yet.

ARTICLE

Persona vectors: Monitoring and controlling character traits in language models | 7 min | AI Research | Anthropic

Anthropic introduces latent vectors that shape tone, expertise, and goals in LLMs without touching the prompt. A practical path toward more modular and controllable AI.

In MORE LINKS you will read:

Achieving High Availability with distributed database on Kubernetes at Airbnb
5 Ways Dremio Makes Apache Iceberg Lakehouses Easy
Five Python Tips You Won’t Find in Most Curriculums

{ MORE LINKS }

TUTORIALS

Build a Streaming Lakehouse with Flink, Kafka, Iceberg, and Polaris | 8 min | Data Engineering | Gilles Philippart | Personal Blog

A hands-on guide to setting up a streaming data lakehouse with schema evolution and end-to-end reliability using open-source tools.

NEWS

Apache Flink 2.1.0: Ushers in a New Era of Unified Real-Time Data + AI with Comprehensive Upgrades | 6 min | Streaming & AI | Apache Flink

New AI-native connectors, unified batch and stream processing, improved autoscaling, and hardened production stability make this Flink's most capable release yet.

In MORE LINKS you will read:

Apache Flink 2.1.0: Ushers in a New Era of Unified Real-Time Data + AI with Comprehensive Upgrades

{ MORE LINKS }

TOOLS

Introducing LangExtract: A Gemini powered information extraction library | 4 min | NLP | Akshay Goel, Atilla Kiraly | Google for Developers Blog

A lightweight Python library for information extraction with built-in schema validation and few-shot support. Built for fast, type-safe NLP pipelines.

In MORE LINKS you will read:

Databricks Labs LSQL

{ MORE LINKS }

EVENTS, CONFS, AND MEETUPS

Data Expo 2025 | 10-11th September | Utrecht

The largest data event in the Netherlands returns with 100+ vendors, 150+ sessions, and a packed agenda for engineers, scientists, and data leaders. Free to attend.

PINNACLE PICKS

Your last week top picks:

Announcing Kedro 1.0 | 6 min | ML | QuantumBlack, AI by McKinsey

Kedro reaches 1.0 with improved modularity, long-term support, and new hooks for ML pipelines.

Stream Kafka Topic to the Iceberg Tables with Zero-ETL | 12 min | Data Streaming | Vu Trinh | Data Engineer Things

Learn how to stream Kafka data into Iceberg tables using Flink for real-time, zero-ETL pipelines.

Why Startups Are Betting Everything on Apache DataFusion | Databases | 5 min | Andrew Lamb | The New Stack Blog

DataFusion is winning over startups with its fast Rust-based query engine and plug-and-play architecture.

____________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on GitHub

➡ Dig previous editions of DataPill

Adam from the GetInData is Now Xebia

💊 DATA Pill #169 - Persona vectors, HA Postgres on K8s, streaming lakehouses

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

Hi,

ARTICLE

TUTORIALS

NEWS

TOOLS

EVENTS, CONFS, AND MEETUPS

PINNACLE PICKS

DATA Pill

2,636 followers

More articles by this author

Explore topics

Hi,

ARTICLE

TUTORIALS

NEWS

TOOLS

EVENTS, CONFS, AND MEETUPS

PINNACLE PICKS

DATA Pill

2,636 followers

💊 DATA Pill #168 - SQL is Back in ClickHouse, Kedro Hits 1.0, and LLMs Learn to Reason

Aug 4, 2025

💊 DATA Pill #167 - Durable AI Loops, Flink Agents, TDD with dbt, S3 Vectors

Jul 28, 2025

💊 DATA Pill #165: Azure’s Role Roulette, Delta Sharing at Zalando

Jul 14, 2025

💊 DATA Pill #164 - Ray at Pinterest, Netflix’s UDA, and Why Fine-Tuning LLMs Is Overrated

Jul 7, 2025

💊 DATA Pill #163 - Is dbt Core dead? Lakehouse at Meta. Flink and Streaming

Jun 30, 2025

💊 DATA Pill #162 - Netflix’s UDA, Claude’s Control Protocol, and the Kafka Fix That Saved 10M

Jun 23, 2025

💊 DATA Pill #161 - Pinterest’s 16K-Action Recs, GenAI Infra, and a Lakehouse Reality Check

Jun 16, 2025

💊 DATA Pill #160 - vLLM vs Ollama: Which Local LLM Wins? ML-Powered Voice Support at Airbnb

Jun 9, 2025

💊 DATA Pill #159 - What the hell is MCP? 99% of AI Startups Will Be Dead by 2026

Jun 2, 2025

Subject: 💊 DATA Pill #158 - Pinterest’s ML Backfills, Uber’s Ray Setup, and Leaner Docker Builds

May 26, 2025

Explore topics