[SPARK-12177][Streaming][Kafka] Update KafkaDStreams to new Kafka 0.10 Consumer API #11863

koeninger · 2016-03-21T15:12:15Z

What changes were proposed in this pull request?

New Kafka consumer api for the released 0.10 version of Kafka

How was this patch tested?

Unit tests, manual tests

… beta consumer, modify getPreferredLocations to choose a consistent executor per topicpartition

… consumers

… use new consumer

… for new consumers is finished

…criptions

…g, but dont handle recalculating the same RDD efficiently

…, fix tests

…mer for dynamic topics, listener, etc

… kafka

…consuming messages on driver

SparkQA · 2016-03-21T15:19:09Z

Test build #53680 has finished for PR 11863 at commit 546246e.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-03-21T16:38:54Z

Test build #53682 has finished for PR 11863 at commit 477055c.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-21T17:02:40Z

Test build #53686 has started for PR 11863 at commit ba41956.

…on attempts to serialize ConsumerRecord

SparkQA · 2016-04-07T22:23:36Z

Test build #55252 has finished for PR 11863 at commit e559183.

This patch fails MiMa tests.
This patch does not merge cleanly.
This patch adds no public classes.

… from driver

markgrover · 2016-04-15T23:46:45Z

external/kafka-beta-assembly/pom.xml

+  <groupId>org.apache.spark</groupId>
+  <artifactId>spark-streaming-kafka-beta-assembly_2.11</artifactId>
+  <packaging>jar</packaging>
+  <name>Spark Project External Kafka Assembly</name>


I think it may be a good idea to update this, so the two kafka assemblies can be differentiated in the build.

zsxwing · 2016-06-29T21:43:33Z

external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala

+@Experimental
+object KafkaUtils extends Logging {
+  /**
+   * Scala constructor for a batch-oriented interface for consuming from Kafka.


Please add :: Experimental :: at the beginning of comments if you add the @Experimental tag.

zsxwing · 2016-06-29T21:47:41Z

Finished my round of reviewing. Some some nits and one question about commitAsync left.

koeninger · 2016-06-29T22:13:07Z

@zsxwing Thanks for the fixes

tdas · 2016-06-29T23:33:11Z

...nal/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala

+ * configuration parameters</a>.
+ *   Requires  "bootstrap.servers" to be set with Kafka broker(s),
+ *   NOT zookeeper servers, specified in host1:port1,host2:port2 form.
+ * @param driverConsumer zero-argument function for you to construct a Kafka Consumer,


tdas · 2016-06-30T00:10:31Z

Overall, this is looking good. Two high level points.

Now we have two subprojects both creating org.apache.spark.streaming.kafka.KafkaUtils. I think this is going to cause problems downstream in the docs and stuff. Also it will be hard for users to disambiguate what's being used in code. So I propose changing the package name for the new one to o.a.s.streaming.kafka010. All the new classes will be in that package. What do you think?
Don't need CanCommitOffsets.
We need a whole lot of new docs, especially updates in the streaming kafka integration guide. That will be a different PR. Could you start working on that?

koeninger · 2016-06-30T00:14:49Z

You do need CanCommitOffsets because DirectKafkaInputDstream is now
private, so otherwise you have nothing to cast to to access that method.
On Jun 29, 2016 7:11 PM, "Tathagata Das" notifications@github.com wrote:

Overall, this is looking good. Two high level points.

Now we have two subprojects both creating
org.apache.spark.streaming.kafka.KafkaUtils. I think this is going to cause
problems downstream in the docs and stuff. Also it will be hard for user to
disambiguate whats being used. So I propose changing the package name for
the new one to o.a.s.streaming.kafka010. All the new classes will be
in that package.
2.

Don't need CanCommitOffsets.
3.

We need a whole lot of new docs, especially updates in the streaming
kafka integration guide. That will be a different PR. Could you start
working on that?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11863 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGABz0T__MSSwsM6znRLSwfh8UwC1Kcks5qQwmqgaJpZM4H1Pg1
.

SparkQA · 2016-06-30T00:17:56Z

Test build #61495 has finished for PR 11863 at commit 31502d9.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2016-06-30T00:24:23Z

Aah, right. My bad. In that case, there arent major issues as far as i can see, let me merge this, and test how the docs look like. I am pretty sure its going to cause trouble with two KafkaUtils. And in that case I will handle the package renaming.

tdas · 2016-06-30T00:24:53Z

Well.. after the tests pass.

koeninger · 2016-06-30T00:51:56Z

I'll do the scaladoc fix and the package rename. I think the package rename is fine even if it did work with docs, just to disambiguate things.

Will start a separate ticket for documentation updates.

tdas · 2016-06-30T00:54:52Z

sounds good. thanks!

…10 version number, to disambiguate from the older connector

SparkQA · 2016-06-30T02:22:45Z

Test build #61506 has finished for PR 11863 at commit f863369.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-30T02:28:13Z

Test build #3151 has finished for PR 11863 at commit f863369.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-30T02:38:03Z

Test build #3150 has finished for PR 11863 at commit f863369.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-30T03:19:46Z

Test build #61513 has finished for PR 11863 at commit cffb0e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2016-06-30T06:20:16Z

LGTM. Merging this to master and 2.0. Thank you very much @koeninger for this awesome effort. :)

…0 Consumer API ## What changes were proposed in this pull request? New Kafka consumer api for the released 0.10 version of Kafka ## How was this patch tested? Unit tests, manual tests Author: cody koeninger <cody@koeninger.org> Closes #11863 from koeninger/kafka-0.9. (cherry picked from commit dedbcee) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

tdas · 2016-06-30T07:22:20Z

.../kafka-0-10/src/test/java/org/apache/spark/streaming/kafka010/JavaConsumerStrategySuite.java

+
+    // make sure constructors can be called from java
+    final ConsumerStrategy<String, String> sub0 =
+      Subscribe.<String, String>apply(topics, kafkaParams, offsets);


This is seems to break in scala 2.10 and not scala 2.11. This is very weird.

Merging this PR broke 2.10 builds - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.10/1947/console

[error] /home/jenkins/workspace/spark-master-compile-sbt-scala-2.10/external/kafka-0-10/src/test/java/org/apache/spark/streaming/kafka010/JavaConsumerStrategySuite.java:54: error: incompatible types: Collection<String> cannot be converted to Iterable<String> [error] Subscribe.<String, String>apply(topics, kafkaParams, offsets); [error] ^ [error] /home/jenkins/workspace/spark-master-compile-sbt-scala-2.10/external/kafka-0-10/src/test/java/org/apache/spark/streaming/kafka010/JavaConsumerStrategySuite.java:69: error: incompatible types: Collection<TopicPartition> cannot be converted to Iterable<TopicPartition> [error] Assign.<String, String>apply(parts, kafkaParams, offsets); [error] ^

We should figure out a way to fix scala 2.10. I don't think we need to revert this though since 2.10 is no longer the default build and it does not fail PRs.

Okay found the issue. In scala 2.10, if companion object of a case class has explicitly defined apply(), then the implicit apply method is not generated. In scala 2.11 it is generated.

I remember now, this type of stuff is why we avoid using case classes in the public API. Do you mind if I convert these to simple classes??

I refactored the API to avoid case classes and minimize publicly visible classes - #13996

tdas · 2016-06-30T11:31:46Z

I played around with the API and I found a few issues

I mentioned above, case classes lead to problems in the public API. The API could be simpler, and same for both Java and Scala users (dont like the apply vs create).
The wrapping between java and scala maps were sometimes leading to the map not being serializable, and thus causing serialization issues in Subscribe and Assign strategies when checkpointing.
ConsumerStrategy class is an interface making it hard to add methods later without breaking compatibility.

I have opened a new PR to address them - please take a look - #13996

BiyuHuang · 2017-08-03T02:46:04Z

hey，I have an question about the setting "enable.auto.commit"，could it be changed ????Because I wanna save the offsets information to zookeeper cluster.

koeninger · 2017-08-03T03:20:13Z

You won't get any reasonable semantics out of auto commit, because it will commit on the driver without regard to what the executors have done.

…

On Aug 2, 2017 21:46, "Wallace Huang" ***@***.***> wrote: hey，I have an question about the setting "auto.commit.enable"， It could be changed ????Because I wanna save the offsets information to zookeeper cluster. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11863 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAGAByVZ2QEf_o627Z7BKdRBuCtmza-Nks5sUTSTgaJpZM4H1Pg1> .

BiyuHuang · 2017-08-03T03:36:27Z

I'm wondering that why the setting "enable.auto.commit" existed, but it was set to false by default and I could't modify it . Anyway, how do I use it ?

koeninger added 14 commits March 3, 2016 16:55

[SPARK-12177][Streaming][Kafka] separate project for starting work on…

aaef0fc

… beta consumer, modify getPreferredLocations to choose a consistent executor per topicpartition

[SPARK-12177][Streaming][Kafka] WIP modify KafkaRDD to use cached new…

d4e504a

… consumers

[SPARK-12177][Streaming][Kafka] WIP modify DirectKafkaInputDStream to…

37cca1e

… use new consumer

Merge branch 'master' into kafka-0.9

1c0699d

[SPARK-12177][Streaming][Kafka] WIP merge backpressure fixes from master

f37717c

[SPARK-12177][Streaming][Kafka] WIP remove KafkaUtils until interface…

e01daea

… for new consumers is finished

[SPARK-12177][Streaming][Kafka] fix handling of kafkaParams

20b59e1

[SPARK-12177][Streaming][Kafka] better handling of dynamic topic subs…

b55a3d4

…criptions

[SPARK-12177][Streaming][Kafka] WIP cached consumers basically workin…

d2949f8

…g, but dont handle recalculating the same RDD efficiently

[SPARK-12177][Streaming][Kafka] WIP remove unused class

47209c0

[SPARK-12177][Streaming][Kafka] WIP add java convenience constructors…

dea8ec0

…, fix tests

[SPARK-12177][Streaming][Kafka] WIP allow user configuration of consu…

43c27e9

…mer for dynamic topics, listener, etc

[SPARK-12177][Streaming][Kafka] WIP allow manual commit of offsets to…

c04bc68

… kafka

[SPARK-12177][Streaming][Kafka] WIP pause topicpartitions to prevent …

546246e

…consuming messages on driver

koeninger added 2 commits March 21, 2016 11:03

[SPARK-12177][Streaming][Kafka] scalastyle cleanup

8f2da8c

Merge branch 'master' of https://github.com/apache/spark into kafka-0.9

477055c

[SPARK-12177][Streaming][Kafka] scalastyle test cleanup

ba41956

[SPARK-12177][Streaming][Kafka] workaround performance issues, error …

e559183

…on attempts to serialize ConsumerRecord

koeninger added 2 commits April 8, 2016 10:57

[SPARK-12177][Streaming][Kafka] keep executor consumer group distinct…

47d3a9f

… from driver

Merge branch 'master' into kafka-0.9

f77288b

markgrover reviewed Apr 15, 2016
View reviewed changes

zsxwing reviewed Jun 29, 2016
View reviewed changes

[SPARK-12177][Streaming][Kafka] review feedback

f863369

tdas reviewed Jun 29, 2016
View reviewed changes

koeninger added 2 commits June 29, 2016 19:55

[SPARK-12177][Streaming][Kafka] fix docstring

4caca18

[SPARK-12177][Streaming][Kafka] move package name to include kafka 0.…

cffb0e0

…10 version number, to disambiguate from the older connector

asfgit closed this in dedbcee Jun 30, 2016

tdas reviewed Jun 30, 2016
View reviewed changes

[SPARK-12177][Streaming][Kafka] Update KafkaDStreams to new Kafka 0.10 Consumer API #11863

[SPARK-12177][Streaming][Kafka] Update KafkaDStreams to new Kafka 0.10 Consumer API #11863

Uh oh!

Conversation

koeninger commented Mar 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 21, 2016

Uh oh!

SparkQA commented Mar 21, 2016

Uh oh!

SparkQA commented Mar 21, 2016

Uh oh!

SparkQA commented Apr 7, 2016

Uh oh!

markgrover Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

zsxwing Jun 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing commented Jun 29, 2016

Uh oh!

koeninger commented Jun 29, 2016

Uh oh!

tdas Jun 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas commented Jun 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koeninger commented Jun 30, 2016

Uh oh!

SparkQA commented Jun 30, 2016

Uh oh!

tdas commented Jun 30, 2016

Uh oh!

tdas commented Jun 30, 2016

Uh oh!

koeninger commented Jun 30, 2016

Uh oh!

tdas commented Jun 30, 2016

Uh oh!

SparkQA commented Jun 30, 2016

Uh oh!

SparkQA commented Jun 30, 2016

Uh oh!

SparkQA commented Jun 30, 2016

Uh oh!

SparkQA commented Jun 30, 2016

Uh oh!

tdas commented Jun 30, 2016

Uh oh!

tdas Jun 30, 2016

Choose a reason for hiding this comment

Uh oh!

rxin Jun 30, 2016

Choose a reason for hiding this comment

Uh oh!

tdas Jun 30, 2016

Choose a reason for hiding this comment

Uh oh!

tdas Jun 30, 2016

Choose a reason for hiding this comment

Uh oh!

tdas commented Jun 30, 2016

Uh oh!

BiyuHuang commented Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koeninger commented Aug 3, 2017 via email

Uh oh!

BiyuHuang commented Aug 3, 2017

Uh oh!

Uh oh!

koeninger commented Mar 21, 2016 •

edited

Loading

zsxwing Jun 29, 2016 •

edited

Loading

tdas Jun 29, 2016 •

edited

Loading

tdas commented Jun 30, 2016 •

edited

Loading

BiyuHuang commented Aug 3, 2017 •

edited

Loading