Archive - Big Data Performance Weekly

Apache Spark Declarative Pipelines: The Evolution from Imperative to Declarative Data Engineering - Part 1

Everything You Need to Know About Apache Spark 4.1's Game-Changing Declarative Pipeline Feature

Jul 10 •

Daniel Aronovich

June 2025

Databricks & Snowflake Summits 2025 aftermath - Part 1: Catalog Wars

The Data Catalog Wars Heat Up: What We Learned from Conference Season

Jun 19 •

and

Daniel Aronovich

May 2025

Apache Spark 4.0: What the Performance Improvements Actually Mean for Your Daily Work

Why Spark 4.0 means less time tuning and more time building

May 29 •

Daniel Aronovich

Apache Spark on Kubernetes: From Manual Submissions to Operators - Part 1

A practitioner's guide to the Spark-on-Kubernetes toolbox and when to use each approach

May 22 •

Daniel Aronovich

and

AI Assistants vs. Big Data: The Missing Infrastructure Context - Part 2

Why Performance Optimization Requires More Than AI-Generated Spark Code in Production Data Pipelines

May 15 •

Daniel Aronovich

Why Big Data Engineers Are Missing Out on the AI Code Revolution - Part 1

The Ultimate Paradox of 2025: Big Data Engineers Top the Charts Yet Fall Behind

May 8 •

Daniel Aronovich

April 2025

Unpacking Trino's Query Execution: From SQL to Splits

Understanding How Your SQL Query Flows Through Trino's Distributed Architecture

Apr 24 •

Daniel Aronovich

Understanding Apache Spark's Execution Hierarchy: From Applications to Tasks

Learn how the Apache Spark execution model works from Applications to Jobs, Stages, and Tasks. To optimize performance and debug your big data…

Apr 17 •

Daniel Aronovich

Mastering Apache Spark Partitioning: Coalesce vs. Repartition

Optimizing Performance Through Strategic Data Distribution

Apr 10 •

Daniel Aronovich

Understanding Wide vs. Narrow Transformations in Apache Spark: Why It Matters for Performance

Optimize Your Spark Jobs by Understanding Data Dependencies and Shuffle Operations

Apr 3 •

Daniel Aronovich

March 2025

Transformations vs. Actions in Apache Spark: The Key to Efficient Data Processing

Understanding the Core Mechanics Behind Spark's Lazy Evaluation Model

Mar 27 •

Daniel Aronovich

Spark Connect Part 2: Debugging and Performance Breakthroughs

Ever spent an entire day trying to debug a Spark job that failed hours into execution? What if you could identify issues in seconds instead?

Mar 20 •

Daniel Aronovich

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts