Discussion about this post

User's avatar
Arshadh's avatar

Thanks for the post. I didn't know that spark connect is not existing today. (With databricks-connect, I thought a native version had always been there).

Few questions.

1) Isn't a spark application a.k.a driver process ? In your post I see, both referred as separate items.

2) This may be something you planned to cover in part -2. How is the resource isolation planned, is that something to do with FAIR pools ?

Expand full comment
Yogesh Gupta's avatar

Spark always followed lazy evaluation approach. I am wondering if I have a spark job doing hundreds of transformation using Dataset APIs, JBDC from other database, reading from HDFS. In this scenario, how execution will happen? my second question is will this introduce too much traffic over spark connect? my third question is how are we going to configure the resources needed for a given complex spark job?

Expand full comment
3 more comments...

No posts