💊 DATA Pill #169 - Persona vectors, HA Postgres on K8s, streaming lakehouses

💊 DATA Pill #169 - Persona vectors, HA Postgres on K8s, streaming lakehouses

Hi,

This week: steer LLMs with persona vectors, see how Airbnb runs Postgres HA on Kubernetes, and learn to build a streaming lakehouse with Flink and Iceberg. Plus, tools from Google and Databricks and Flink’s biggest release yet.

ARTICLE

Persona vectors: Monitoring and controlling character traits in language models | 7 min | AI Research | Anthropic

Anthropic introduces latent vectors that shape tone, expertise, and goals in LLMs without touching the prompt. A practical path toward more modular and controllable AI.

In MORE LINKS you will read:

  • Achieving High Availability with distributed database on Kubernetes at Airbnb

  • 5 Ways Dremio Makes Apache Iceberg Lakehouses Easy

  • Five Python Tips You Won’t Find in Most Curriculums

{ MORE LINKS }

TUTORIALS

Build a Streaming Lakehouse with Flink, Kafka, Iceberg, and Polaris | 8 min | Data Engineering | Gilles Philippart | Personal Blog

A hands-on guide to setting up a streaming data lakehouse with schema evolution and end-to-end reliability using open-source tools.

NEWS

Apache Flink 2.1.0: Ushers in a New Era of Unified Real-Time Data + AI with Comprehensive Upgrades | 6  min | Streaming & AI | Apache Flink

New AI-native connectors, unified batch and stream processing, improved autoscaling, and hardened production stability make this Flink's most capable release yet.

In MORE LINKS you will read:

  • Apache Flink 2.1.0: Ushers in a New Era of Unified Real-Time Data + AI with Comprehensive Upgrades

{ MORE LINKS }

TOOLS

Introducing LangExtract: A Gemini powered information extraction library | 4 min | NLP | Akshay Goel, Atilla Kiraly | Google for Developers Blog

A lightweight Python library for information extraction with built-in schema validation and few-shot support. Built for fast, type-safe NLP pipelines.

In MORE LINKS you will read:

  • Databricks Labs LSQL

{ MORE LINKS }

EVENTS, CONFS, AND MEETUPS

Data Expo 2025 | 10-11th September | Utrecht

The largest data event in the Netherlands returns with 100+ vendors, 150+ sessions, and a packed agenda for engineers, scientists, and data leaders. Free to attend.

PINNACLE PICKS

Your last week top picks:

Announcing Kedro 1.0 | 6  min | ML | QuantumBlack, AI by McKinsey

Kedro reaches 1.0 with improved modularity, long-term support, and new hooks for ML pipelines.

Stream Kafka Topic to the Iceberg Tables with Zero-ETL | 12 min | Data Streaming | Vu Trinh | Data Engineer Things

Learn how to stream Kafka data into Iceberg tables using Flink for real-time, zero-ETL pipelines.

Why Startups Are Betting Everything on Apache DataFusion | Databases | 5 min | Andrew Lamb | The New Stack Blog

DataFusion is winning over startups with its fast Rust-based query engine and plug-and-play architecture.

____________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on GitHub

➡ Dig previous editions of DataPill

Adam from the GetInData is Now Xebia

To view or add a comment, sign in

Explore topics