Delta Lake / Last Week in a Byte

Add meeting Rate page Subscribe

Delta Lake / Last Week in a Byte

These are all the meetings we have in "Last Week in a Byte" (part of the organization "Delta Lake"). Click into individual meeting pages to watch the recording and search or read the transcript.

18 Jul 2023

Welcome back to another edition of Last Week in a Byte! Over the next 2 editions, we'll recap all of the great Delta Lake sessions at this year's Data+AI Summit in San Francisco, CA.
  • 1 participant
  • 3 minutes
delta
lake
insights
dashboards
api
slack
ai
apps
recent
rust
youtube image

6 Jun 2023

In this edition of Last Week in a Byte, we'll hear how Kubit uses Delta Sharing to power their product analytics platform, hear about an exciting new contribution to the Dask community, and 2 new releases from the Delta Lake and Delta Sharing projects!
  • 1 participant
  • 2 minutes
delta
latest
das
newsletter
slack
dilution
sharing
qubit
analytics
lake
youtube image

30 May 2023

We have a #rust takeover in Delta Lake this week! Plus, we'll hear how SQLFluff can make your Spark SQL code on Delta Lake look pretty. And, of course, we can't forget to mention the upcoming Data + AI Summit this year from June 26 - 29 in sunny San Francisco, CA.
  • 1 participant
  • 3 minutes
delta
lake
ai
databricks
fluff
hosted
sql
weekly
slack
california
youtube image

9 May 2023

Learn about the latest #deltalake news a week late! This quick byte includes all the great Delta lake publication and shout outs to #pydata #seattle
  • 1 participant
  • 3 minutes
lake
delta
mlflow
week
finally
ai
events
slack
comments
washington
youtube image

18 Apr 2023

Learn about the latest #deltalake news a week late! This quick byte includes:
- delta-rs 0.9.0 release: https://github.com/delta-io/delta-rs/releases/tag/rust-v0.9.0

- delta-spark 2.3 release slides cc #apachespark https://www.linkedin.com/posts/willgirten_delta-lake-230-was-released-last-week-and-activity-7051596577588551680-cN5M/?lipi=urn%3Ali%3Apage%3Ad_flagship3_publishing_post_edit%3Bt9SxCaqKRzedAIzhUgh%2BQw%3D%3D

- Kotosiro is a minimalistic #rustlang implementation of a #deltasharing server that currently supports both #AWS and #GCP environments. https://github.com/kotosiro/sharing

- DataLakeIO connects #ApacheBeam and data lakes such as Apache Hudi, Apache Iceberg, and of course, Delta Lake https://github.com/nanhu-lab/beam-datalake

Great publications including:
- GeekCoders: How I use MACK Library in Delta Lake using Databricks/PySpark https://youtu.be/qRR5n6T2N_8

- Lakehouse by the sea: Migrating Seafowl storage layer to delta-rs by Marko Grujic https://www.splitgraph.com/blog/seafowl-delta-storage-layer

- Matthew Powers, CFA published two great Delta Lake blogs - How to use Delta Lake generated columns (https://delta.io/blog/2023-04-12-delta-lake-generated-columns/) and How to create and append to Delta Lake tables with pandas (https://delta.io/blog/2023-04-01-create-append-delta-lake-table-pandas/) #pandas #generatedcolumns

- Khuyen Tran posted How Delta Lake simplifies pandas DataFrame versioning and allows access to prior versions for auditing and debugging using delta-rs cc #pandas #dataframes https://www.linkedin.com/posts/khuyen-tran-1401_python-datascience-deltalake-activity-7050102630802460672-zfIV/?lipi=urn%3Ali%3Apage%3Ad_flagship3_publishing_post_edit%3Bt9SxCaqKRzedAIzhUgh%2BQw%3D%3D

- Long time Apache Spark contributor and Databricks Beacon Bartosz Konieczny recently published Table file-formats - Z-Order compaction: Delta Lake. https://www.waitingforcode.com/delta-lake/table-file-formats-z-order-compaction-delta-lake/read
  • 1 participant
  • 3 minutes
delta
lake
latest
enhancements
slack
spark
rust
maintainer
io
github
youtube image

11 Apr 2023

Learn about the latest #deltalake news a week late! This quick byte includes:
- We're excited about the release of Delta Lake 2.3 with all of its amazing features, check out https://github.com/delta-io/delta/releases/tag/v2.3.0
#deltalake #apachespark

- A shout out to Will Girten for his great #deltasharing tips including how you can use Delta Sharing for #streaming https://www.linkedin.com/posts/willgirten_deltasharing-opensource-activity-7051192801446744064-8dYH

- A shout to #finos #legend project where Delta Lake has been included as an integration: https://legend.finos.org/docs/community/external-integrations

- Like Delta Users #slack but wish we had an archive and it's searchable? Check out https://linen.delta.io

- A great blog by Nick Karpov on Support for Delta Lake Tables in #AWS #Lambda https://delta.io/blog/2023-04-06-deltalake-aws-lambda-wrangler-pandas/
  • 1 participant
  • 3 minutes
delta
latest
slack
lake
great
features
streaming
spark
log
merge
youtube image

28 Mar 2023

Learn about the latest #deltalake news a week late! This quick byte includes:
- We’re excited about the #DeltaLake 2.3.0 Preview on #ApacheSpark 3.3, For more information, check out https://github.com/delta-io/delta/releases/tag/v2.3.0rc1

- Matthew Powers, CFA and Miguel Angel Diaz Rodriguez, MLOps Director at AB InBev presented Model Serving with MLflow and Delta Lake main features at Apache Spark Bogata meetup

- We recently published Delta Lake D3L2 vidcast Implementing a Data Lakehouse for Improved Data Science and Analytics at T-Mobile with Robert Thompson and Geoff Freeman, Members of Technical Staff at T-Mobile

- If you're in the Seattle area in April, there are two #pydata events:
PyData 2023 Preconference Meetup PARTY!: https://www.meetup.com/seattle-spark-meetup/events/291903857/

PyData Seattle 2023
https://pydata.org/seattle2023/

- Jacek Laskowski's is adding more to his Internals of Delta Lake book, calling out #DeltaLake 2.2 metadata operations like LIMIT

- Nick Karpov published Z-Order: Visualization and Implementation, exploring the Delta Lake Spark connector’s Z-Order command through visualization and implementation. #zorder #apachespark

- Denny Lee's first blog as part of the “Ask Delta?” blog series is Why does altering a Delta Lake table schema not show up in the Spark DataFrame? #schema #apachespark

- Matthew Powers, CFA published How to Convert from CSV to Delta Lake #csv

- Jim Hibbard published Running ML Workflows with Delta Lake and Ray (Part 1) #rayml
  • 1 participant
  • 3 minutes
lake
mlflow
delta
recent
spark
events
flow
seattle
damgy
preview
youtube image

21 Mar 2023

Learn about the latest #deltalake news a week late! This quick byte includes:
- We're proud to announce the release of delta-rs rust-v0.8.0 release! More information at https://github.com/delta-io/delta-rs/releases/tag/rust-v0.8.0. #rustlang

- Gurunath Rajagopal recently released his Lakehouse Sharing project, which Demonstrates a table format agnostic data sharing server (based on #deltasharing protocol) implemented in Python for both #deltalake and #apacheiceberg formats.

- Want to help with creating Delta Lake helper functions without Spark dependencies? Check out https://github.com/MrPowers/levi and chat with Matthew Powers, CFA, who created the levi, mack, and jodie Delta Lake helper function libraries.

- Get the latest Delta table version using mack helper functions using mack.

- Joydeep Banik Roy published CHANGE DATA FEED — Time Travel — Failure Scenarios, Prevention & Recovery, which covers three scenarios where time travel queries fail and how to check for these errors easily.

- We are happy to partner with Blueprint on their Velocity Tour to bring you demos, meet and greets, speaking sessions, and more! They will be at Data Council Austin 2023, PyCon US 2023 in Salt Lake City, and PyData Seattle 2023 in Seattle for March and April. Check out the Velocity Tour for all of their dates!

- Robert Kossendey published the fourth blog in his insightful series on his journey to the #lakehouse with the post Lakehouse - A resumé.
  • 1 participant
  • 3 minutes
delta
lake
2023
project
version
rust
blueprint
journey
thanks
server
youtube image

14 Mar 2023

Learn about the latest #deltalake news a week late! This quick byte includes:
- Our #deltalake contributor of the month is Gerhard Breukl!

- delta-rs 0.7.0 was released, which includes various enhancements, fixed bugs, and merged PRs. Check out New features in the Python deltalake 0.7.0 release of delta-rs for more information on the #python release cc #rustlang

- Brayan Jules Jacques published HIO: a library that provides elegant functions to manage HDFS filesystem and cloud object stores

- D3L2: Massive Data Processing in Adobe Experience Platform Using Delta Lake with Yeshwanth Vijayakumar, Senior Engineering Manager and Architect at Adobe discuss how the #data #lakehouse architecture at Adobe Experience Platform combines with the #realtime Customer Profile architecture to increase their #apachespark batch workload throughputs and reduce costs while maintaining functionality - with #deltalake.

- D3L2: The Journey Unifying Data Lake and Data Warehouse with Robert Kossendey at Claimsforce on their journey to unifying #datalakes and #datawarehouses cc Robert Kossendey, claimsforce
We recently had the fun London meetup Building reliable lakehouses with Delta Lake Primer and AMA with Simon Whiteley and Denny Lee


- Vítor Teixeira published Delta Lake — Automatic Schema Evolution and contributed PR 1645.

- Vítor Teixeira published Delta Lake: Keeping It Fast and Clean on how to improve your Delta tables’ performance.

- Brayan Jules Jacques published Jodie - Append Without Duplication which discusses Jodie (an open-source library to perform common #deltalake operations using #apachespark and #scala) focusing on deduplication functions.

- Bryan Cafferky published Understanding Delta File Logs - The Heart of #DeltaLake.

- TR Raveendra published a seven-part #DeltaLake tutorial series!

- Explore the full range of the merge command with Delta Lake Merge

- Want to know more on how to contribute to delta-spark? Check out Getting started contributing to Delta Lake Spark.

- Register for PyData Seattle 2023
  • 1 participant
  • 3 minutes
delta
contributor
lake
pie
week
suggestions
features
seattle
pi
hio
youtube image

10 Jan 2023

Learn about the latest #deltalake news a week late! This quick byte includes:
- delta-rs 0.6.0 release which includes support for check invariants, support for DataFusion 15, a new Python binding release Github Action for universal2 wheel, and more! https://github.com/delta-io/delta-rs/releases/tag/rust-v0.6.0 #rustlang #python #datafusion

- Noritaka Sekiyama, Kyle Duong, and Sandeep Adwankar published Introducing native Delta Lake table support with AWS Glue crawlers on the AWS Big Data blog. https://aws.amazon.com/blogs/big-data/introducing-native-delta-lake-table-support-with-aws-glue-crawlers/ #aws #bigdata #awsglue

- Matthew Powers, Chitral Verma published Reading Delta Lake Tables into Polars DataFrames https://delta.io/blog/2022-12-22-reading-delta-lake-tables-polars-dataframe/ #polars

- Mehdi Ouazza published Python Devs, It's Time To Get On The Rust Bandwagon! https://youtu.be/j_1uUbxDWjY #python #rustlang

- Want to watch Delta Rust code development? Join R. Tyler Croy (agentdero) LIVE on twitch.tv. https://www.twitch.tv/agentdero/schedule?seriesID=d8138556-c4c2-4383-a2cd-44a12610dc83 #twitch

- A big shout out to Florian Valeye who is Delta Lake's contributor of the month! https://delta.io/profiles/florian-valeye/
  • 1 participant
  • 2 minutes
delta
rust
twitch
latest
watch
weekly
lake
lee
thanks
blog
youtube image

20 Dec 2022

Learn about the latest #deltalake news a week late! This quick byte includes:
- aws-pandas-sdk (aka aws-wrangler) which includes an optional dependency to Delta Lake to simplify integration with AWS services: https://github.com/aws/aws-sdk-pandas/pull/1834

- polars, blazingly fast DataFrames in Rust, Python, and node.js has support for reading Delta Lake tables: https://github.com/pola-rs/polars/pull/5761

- It is also included in the Python Polars 0.15.3 release: https://github.com/pola-rs/polars/releases/tag/py-0.15.3

- Christina Taylor who leads the Next Generation Communications Platform at Carvana recently published the blog “Open Format in an Omni-cloud World: From EDW targets to Structured Streaming on Delta Lake” which showcases her team’s migration from a data warehouse (Big Query) to Delta Lake using structured streaming: https://medium.com/@christinataylor0926/open-format-in-an-omni-cloud-world-from-edw-targets-to-structured-streaming-on-delta-lake-7913523a868e

- Simon Whiteley and I had our monthly Ask Us Anything answering your most pressing Delta Lake questions https://youtu.be/2HxXM150TnA

- Vini Jaiswal, Florian Valeye, and Will Girten did a 2022 retrospective in the last community office hours for the year on Delta Spark, Delta Rust, and Delta Sharing respectively. https://youtu.be/4OK7jpzj5yM

- At Linux Foundation’s Open Source in Finance Forum, Antoine Amend and Ashley Trainer discuss the FINOS Legend project as part of the session Modernize Regulatory Reporting: Get Ready for T+1 Settlement https://youtu.be/wDxm-zAnlno where they include a call out for Delta Lake to harmonize with open data standards

Delta Sharing connectors:
- Java connector: https://github.com/databrickslabs/delta-sharing-java-connector
- MLflow: https://github.com/databrickslabs/arcuate
- Node.js: https://github.com/goodwillpunning/nodejs-sharing-client
- Power BI: https://learn.microsoft.com/en-us/power-query/connectors/deltasharing
- GoLang: https://github.com/magpierre/delta-sharing/tree/golangdev/golang/delta_sharing_go
- C++: https://github.com/magpierre/cpp_delta_sharing_client
- Terminal: https://github.com/magpierre/dsmb
- Airflow: https://github.com/apache/airflow/pull/22692
- R: https://github.com/zacdav-db/delta-sharing-r
- Excel: https://www.exponam.com/exponam-launches-delta-sharing-excel-add-in/
- Rust: https://github.com/r3stl355/delta-sharing-rust-client
  • 1 participant
  • 3 minutes
delta
lake
streaming
flow
integrations
latest
harmonize
forum
aws
wrangler
youtube image

13 Dec 2022

Learn about the latest #deltalake news a week late! This quick byte includes:
- Delta Connectors 0.6.0: https://github.com/delta-io/connectors/releases/tag/v0.6.0 which includes support for the Flink/Delta connect or on #apacheflink 1.15.3

- Delta Sharing 0.6.0: https://github.com/delta-io/delta-sharing/releases/tag/v0.6.0 #deltasharing

- Delta Spark 2.2 Release supporting #apachespark 3.3 https://github.com/delta-io/delta/releases/tag/v2.2.0

- D3L2: Discussing Data Integration with Airbyte and Delta Lake: https://youtu.be/sa49O_Ho1X4

- D3L2: The Genesis of Delta Rust with QP Hou: https://youtu.be/ZQdEdifcBh8

- Delta Lake is the foundation of an open data platform at Scribd: https://www.databricks.com/customers/scribd/delta-lake

- Data Sharing across Government Agencies using Delta Sharing: https://delta.io/blog/2022-12-08-data-sharing-across-government-delta-sharing/

- Linux Foundation called out the Delta Lake project: https://project.linuxfoundation.org/hubfs/LF%20Research/2022%20Linux%20Foundation%20Annual%20Report.pdf?hsLang=en

We're happy to announce two new Delta Lake maintainers
- Will Jones: https://github.com/wjones127
- Robert Pack: https://github.com/roeap
  • 1 participant
  • 2 minutes
delta
lake
latest
connector
flink
sharing
streaming
enjoyed
rust
week
youtube image

6 Dec 2022

Learn about the latest Delta Lake news a week late! This quick byte includes:
- Delta Lake Rust 0.5.0 release with 42 contributors and 34 new contributors #rustlang
https://github.com/delta-io/delta-rs/releases/tag/rust-v0.5.0

- Delta Lake Rust Python 0.6.4 release #python
https://github.com/delta-io/delta-rs/releases/tag/python-v0.6.4

- Try out the latest test version of Delta Spark 2.2 RC1 #apachespark Apache Spark
https://github.com/delta-io/delta/releases/tag/v2.2.0rc1

- Setting the Table: Benchmarking Open Table Formats
https://brooklyndata.co/blog/benchmarking-open-table-formats

- AWS Glue 4.0 includes native support for Delta Lake 2.1 with Apache Spark 3.3 #awsglue
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html

- AWS EMR 6.9 includes native support for Delta Lake 2.1 with Apache Spark 3.3. #emr
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-690-release.html

- Robert Kossendey from claimsforce recently published Lakehouse — Running Delta Lake on AWS Glue
https://medium.com/claimsforce/lakehouse-running-delta-lake-on-top-of-aws-glue-181133a916f3

- Matthew Powers recently published mack - Delta Lake operations help functions
https://github.com/MrPowers/mack

- Thanks to Andrew Bauman for creating the Delta Lake #docker which includes #rustlang, #python, #pyspark, #spark, #scala, and #jupyternotebook. Try it out now https://go.delta.io/docker.
  • 1 participant
  • 2 minutes
docker
delta
latest
lake
newsletter
spark
week
late
aws
benchmarking
youtube image