I’ve been trying out Kafka recently. I followed some of the quick start guide without problem. Then I got to the Hadoop consumer section, which is pretty crucial for my use case, and I got forwarded to a simple github page.
I will describe the problems I’ve had while following that github README, along with the solutions I found to solve them. Hopefully, that can end up helping someone out there
Step 1
Step 1 of the github read me failed for me with the following error:
root@my-server~/kafka-hadoop-consumer/kafka# ./sbt package
[info] Building project Kafka 0.7 against Scala 2.8.0
[info] using KafkaProject with sbt 0.7.5 and Scala 2.7.7
[info]
[info] == core-kafka / compile ==
[info] Source analysis: 126 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/Kafka.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/Kafka.scala:23: value log4j is not a member of package org.apache
[error] import org.apache.log4j.jmx.LoggerDynamicMBean
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/ConsoleConsumer.scala:22: not found: value joptsimple
[error] import joptsimple._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/ConsumerConnector.scala:21: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/ConsumerIterator.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/PartitionTopicInfo.scala:25: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/Fetcher.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/FetcherRunnable.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/KafkaMessageStream.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/SimpleConsumer.scala:23: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/TopicCount.scala:21: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala:22: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/utils/KafkaScheduler.scala:22: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] error while loading ZooKeeper, Missing dependency 'class org.apache.log4j.Logger', required by /root/kafka-hadoop-consumer/kafka/core/lib/zookeeper-3.3.3.jar(org/apache/zookeeper/ZooKeeper.class)
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/storage/OracleOffsetStorage.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/consumer/storage/OracleOffsetStorage.scala:38: recursive value offset needs type
[error] case Some(offset) => offset
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/javaapi/Implicits.scala:19: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/javaapi/message/ByteBufferMessageSet.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/ProducerPool.scala:22: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/async/AsyncProducer.scala:22: value log4j is not a member of package org.apache
[error] import org.apache.log4j.{Level, Logger}
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/async/ProducerSendThread.scala:21: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/log/Log.scala:24: value log4j is not a member of package org.apache
[error] import org.apache.log4j._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/log/LogManager.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/message/ByteBufferMessageSet.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/message/CompressionUtils.scala:25: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/message/FileMessageSet.scala:23: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/network/SocketServer.scala:28: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/network/Transmission.scala:21: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/ConfigBrokerPartitionInfo.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.spi.LoggingEvent
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala:21: value log4j is not a member of package org.apache
[error] import org.apache.log4j.{Logger, AppenderSkeleton}
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala:80: value closed is not a member of kafka.producer.KafkaLog4jAppender
[error] if(!this.closed) {
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala:81: value closed is not a member of kafka.producer.KafkaLog4jAppender
[error] this.closed = true
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/Producer.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/SyncProducer.scala:26: value log4j is not a member of package org.apache
[error] import org.apache.log4j.{Level, Logger}
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/ZKBrokerPartitionInfo.scala:21: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/producer/async/DefaultEventHandler.scala:21: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/server/KafkaRequestHandlers.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/server/KafkaServer.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/server/KafkaServerStartable.scala:19: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/server/KafkaZooKeeper.scala:20: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/ConsumerPerformance.scala:23: not found: value joptsimple
[error] import joptsimple._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/ConsumerPerformance.scala:28: too many arguments for constructor Object: ()java.lang.Object
[error] abstract class ShutdownableThread(name: String) extends Thread(name) {
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/ConsumerShell.scala:19: not found: value joptsimple
[error] import joptsimple._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/GetOffsetShell.scala:20: not found: value joptsimple
[error] import joptsimple._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/ProducerPerformance.scala:25: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/ProducerPerformance.scala:26: not found: value joptsimple
[error] import joptsimple.{OptionSet, OptionParser}
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/ProducerShell.scala:21: not found: value joptsimple
[error] import joptsimple._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/ReplayLogProducer.scala:4: not found: value joptsimple
[error] import joptsimple.OptionParser
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/ReplayLogProducer.scala:5: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/SimpleConsumerPerformance.scala:20: not found: value joptsimple
[error] import joptsimple._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/tools/SimpleConsumerShell.scala:20: not found: value joptsimple
[error] import joptsimple._
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/utils/Throttler.scala:19: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/utils/Utils.scala:25: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] /root/kafka-hadoop-consumer/kafka/core/src/main/scala/kafka/utils/ZkUtils.scala:25: value log4j is not a member of package org.apache
[error] import org.apache.log4j.Logger
[error] ^
[error] 55 errors found
[info] == core-kafka / compile ==
[error] Error running compile: Compilation failed
[info]
[info] Total time: 19 s, completed Sep 19, 2011 10:17:24 AM
[info]
[info] Total session time: 20 s, completed Sep 19, 2011 10:17:24 AM
[error] Error during build.
I had never used SBT before, so at first I was puzzled. Why would they post a command that spits out errors in their README?
It turns out the solution was pretty simple. I figured it out by looking at a similar problem on the old (pre-Apache incubation) Kafka mailing list. You just need to run ./sbt update, which prints out the following output:
root@my-server~/kafka-hadoop-consumer/kafka# ./sbt update
[info] Building project Kafka 0.7 against Scala 2.8.0
[info] using KafkaProject with sbt 0.7.5 and Scala 2.7.7
[info]
[info] == core-kafka / update ==
[info] downloading http://repo2.maven.org/maven2/net/sf/jopt-simple/jopt-simple/3.2/jopt-simple-3.2.jar ...
[info] [SUCCESSFUL ] net.sf.jopt-simple#jopt-simple;3.2!jopt-simple.jar (426ms)
[info] downloading http://repo2.maven.org/maven2/log4j/log4j/1.2.15/log4j-1.2.15.jar ...
[info] [SUCCESSFUL ] log4j#log4j;1.2.15!log4j.jar (429ms)
[info] downloading http://repo2.maven.org/maven2/junit/junit/4.1/junit-4.1.jar ...
[info] [SUCCESSFUL ] junit#junit;4.1!junit.jar (246ms)
[info] downloading http://repo2.maven.org/maven2/org/easymock/easymock/3.0/easymock-3.0.jar ...
[info] [SUCCESSFUL ] org.easymock#easymock;3.0!easymock.jar (160ms)
[info] downloading http://repo2.maven.org/maven2/org/scalatest/scalatest/1.2/scalatest-1.2.jar ...
[info] [SUCCESSFUL ] org.scalatest#scalatest;1.2!scalatest.jar (537ms)
[info] downloading http://repo2.maven.org/maven2/cglib/cglib-nodep/2.2/cglib-nodep-2.2.jar ...
[info] [SUCCESSFUL ] cglib#cglib-nodep;2.2!cglib-nodep.jar (57ms)
[info] downloading http://repo2.maven.org/maven2/org/objenesis/objenesis/1.2/objenesis-1.2.jar ...
[info] [SUCCESSFUL ] org.objenesis#objenesis;1.2!objenesis.jar (235ms)
[info] :: retrieving :: kafka#core-kafka_2.8.0 [sync]
[info] confs: [compile, runtime, test, provided, system, optional, sources, javadoc]
[info] 7 artifacts copied, 0 already retrieved (2748kB/39ms)
[info] == core-kafka / update ==
[info]
[info] == hadoop producer / update ==
[info] downloading http://repo1.maven.org/maven2/org/codehaus/jackson/jackson-core-asl/1.5.5/jackson-core-asl-1.5.5.jar ...
[info] [SUCCESSFUL ] org.codehaus.jackson#jackson-core-asl;1.5.5!jackson-core-asl.jar (555ms)
[info] downloading http://repo1.maven.org/maven2/org/apache/avro/avro/1.4.1/avro-1.4.1.jar ...
[info] [SUCCESSFUL ] org.apache.avro#avro;1.4.1!avro.jar (452ms)
[info] downloading http://repo1.maven.org/maven2/org/codehaus/jackson/jackson-mapper-asl/1.5.5/jackson-mapper-asl-1.5.5.jar ...
[info] [SUCCESSFUL ] org.codehaus.jackson#jackson-mapper-asl;1.5.5!jackson-mapper-asl.jar (149ms)
[info] downloading http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.5.11/slf4j-api-1.5.11.jar ...
[info] [SUCCESSFUL ] org.slf4j#slf4j-api;1.5.11!slf4j-api.jar (291ms)
[info] downloading http://repo1.maven.org/maven2/com/thoughtworks/paranamer/paranamer/2.2/paranamer-2.2.jar ...
[info] [SUCCESSFUL ] com.thoughtworks.paranamer#paranamer;2.2!paranamer.jar (336ms)
[info] downloading http://repo1.maven.org/maven2/com/thoughtworks/paranamer/paranamer-ant/2.2/paranamer-ant-2.2.jar ...
[info] [SUCCESSFUL ] com.thoughtworks.paranamer#paranamer-ant;2.2!paranamer-ant.jar (139ms)
[info] downloading http://repo1.maven.org/maven2/org/mortbay/jetty/jetty/6.1.22/jetty-6.1.22.jar ...
[info] [SUCCESSFUL ] org.mortbay.jetty#jetty;6.1.22!jetty.jar (535ms)
[info] downloading http://repo1.maven.org/maven2/org/apache/velocity/velocity/1.6.4/velocity-1.6.4.jar ...
[info] [SUCCESSFUL ] org.apache.velocity#velocity;1.6.4!velocity.jar (133ms)
[info] downloading http://repo1.maven.org/maven2/commons-lang/commons-lang/2.5/commons-lang-2.5.jar ...
[info] [SUCCESSFUL ] commons-lang#commons-lang;2.5!commons-lang.jar (216ms)
[info] downloading http://repo1.maven.org/maven2/com/thoughtworks/paranamer/paranamer-generator/2.2/paranamer-generator-2.2.jar ...
[info] [SUCCESSFUL ] com.thoughtworks.paranamer#paranamer-generator;2.2!paranamer-generator.jar (212ms)
[info] downloading http://repo1.maven.org/maven2/org/apache/ant/ant/1.7.1/ant-1.7.1.jar ...
[info] [SUCCESSFUL ] org.apache.ant#ant;1.7.1!ant.jar (343ms)
[info] downloading http://repo1.maven.org/maven2/com/thoughtworks/qdox/qdox/1.10.1/qdox-1.10.1.jar ...
[info] [SUCCESSFUL ] com.thoughtworks.qdox#qdox;1.10.1!qdox.jar (447ms)
[info] downloading http://repo1.maven.org/maven2/asm/asm/3.2/asm-3.2.jar ...
[info] [SUCCESSFUL ] asm#asm;3.2!asm.jar (112ms)
[info] downloading http://repo1.maven.org/maven2/org/apache/ant/ant-launcher/1.7.1/ant-launcher-1.7.1.jar ...
[info] [SUCCESSFUL ] org.apache.ant#ant-launcher;1.7.1!ant-launcher.jar (248ms)
[info] downloading http://repo1.maven.org/maven2/org/mortbay/jetty/jetty-util/6.1.22/jetty-util-6.1.22.jar ...
[info] [SUCCESSFUL ] org.mortbay.jetty#jetty-util;6.1.22!jetty-util.jar (321ms)
[info] downloading http://repo1.maven.org/maven2/org/mortbay/jetty/servlet-api/2.5-20081211/servlet-api-2.5-20081211.jar ...
[info] [SUCCESSFUL ] org.mortbay.jetty#servlet-api;2.5-20081211!servlet-api.jar (213ms)
[info] downloading http://repo1.maven.org/maven2/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar ...
[info] [SUCCESSFUL ] commons-collections#commons-collections;3.2.1!commons-collections.jar (394ms)
[info] downloading http://repo1.maven.org/maven2/oro/oro/2.0.8/oro-2.0.8.jar ...
[info] [SUCCESSFUL ] oro#oro;2.0.8!oro.jar (68ms)
[info] :: retrieving :: kafka#hadoop-producer_2.8.0 [sync]
[info] confs: [compile, runtime, test, provided, system, optional, sources, javadoc]
[info] 20 artifacts copied, 0 already retrieved (5377kB/39ms)
[info] == hadoop producer / update ==
[info]
[info] == hadoop consumer / update ==
[info] downloading http://repo1.maven.org/maven2/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar ...
[info] [SUCCESSFUL ] commons-httpclient#commons-httpclient;3.1!commons-httpclient.jar (599ms)
[info] downloading http://repo1.maven.org/maven2/joda-time/joda-time/1.6/joda-time-1.6.jar ...
[info] [SUCCESSFUL ] joda-time#joda-time;1.6!joda-time.jar (396ms)
[info] downloading http://repo1.maven.org/maven2/commons-logging/commons-logging/1.0.4/commons-logging-1.0.4.jar ...
[info] [SUCCESSFUL ] commons-logging#commons-logging;1.0.4!commons-logging.jar (105ms)
[info] downloading http://repo1.maven.org/maven2/commons-codec/commons-codec/1.2/commons-codec-1.2.jar ...
[info] [SUCCESSFUL ] commons-codec#commons-codec;1.2!commons-codec.jar (246ms)
[info] :: retrieving :: kafka#hadoop-consumer_2.8.0 [sync]
[info] confs: [compile, runtime, test, provided, system, optional, sources, javadoc]
[info] 6 artifacts copied, 0 already retrieved (1321kB/18ms)
[info] == hadoop consumer / update ==
[info]
[info] == contrib / update ==
[warn] No dependency configuration found, using defaults.
[info] :: retrieving :: kafka#contrib_2.8.0 [sync]
[info] confs: [default]
[info] 0 artifacts copied, 0 already retrieved (0kB/3ms)
[info] == contrib / update ==
[info]
[info] == java-examples / update ==
[info] :: retrieving :: kafka#java-examples_2.8.0 [sync]
[info] confs: [compile, runtime, test, provided, system, optional, sources, javadoc]
[info] 2 artifacts copied, 0 already retrieved (434kB/12ms)
[info] == java-examples / update ==
[info]
[info] == perf / update ==
[info] :: retrieving :: kafka#perf_2.8.0 [sync]
[info] confs: [compile, runtime, test, provided, system, optional, sources, javadoc]
[info] 2 artifacts copied, 0 already retrieved (434kB/12ms)
[info] == perf / update ==
[info]
[info] == Kafka / update ==
[warn] No dependency configuration found, using defaults.
[info] :: retrieving :: kafka#kafka_2.8.0 [sync]
[info] confs: [default]
[info] 0 artifacts copied, 0 already retrieved (0kB/3ms)
[info] == Kafka / update ==
[success] Successful.
[info]
[info] Total time: 70 s, completed Sep 19, 2011 10:21:23 AM
[info]
[info] Total session time: 71 s, completed Sep 19, 2011 10:21:23 AM
[success] Build completed successfully.
And then you can run the ./sbt package command, which this time printed out the (overall) much nicer output below:
root@my-server~/kafka-hadoop-consumer/kafka# ./sbt package
[info] Building project Kafka 0.7 against Scala 2.8.0
[info] using KafkaProject with sbt 0.7.5 and Scala 2.7.7
[info]
[info] == core-kafka / compile ==
[info] Source analysis: 126 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[warn] there were unchecked warnings; re-run with -unchecked for details
[warn] one warning found
[info] Compilation successful.
[info] Post-analysis: 566 classes.
[info] == core-kafka / compile ==
[info]
[info] == hadoop consumer / compile ==
[info] Source analysis: 12 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[error] Note: Some input files use or override a deprecated API.
[error] Note: Recompile with -Xlint:deprecation for details.
[info] Compilation successful.
[info] Post-analysis: 13 classes.
[info] == hadoop consumer / compile ==
[info]
[info] == java-examples / compile ==
[info] Source analysis: 6 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Compilation successful.
[info] Post-analysis: 6 classes.
[info] == java-examples / compile ==
[info]
[info] == perf / compile ==
[info] Source analysis: 6 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Compilation successful.
[info] Post-analysis: 6 classes.
[info] == perf / compile ==
[info]
[info] == core-kafka / package ==
[warn] No Main-Class attribute will be added automatically added:
[warn] Multiple classes with a main method were detected. Specify main class explicitly with:
[warn] override def mainClass = Some("className")
[info] Packaging ./core/target/scala_2.8.0/kafka-0.7.jar ...
[info] Packaging complete.
[info] == core-kafka / package ==
[info]
[info] == hadoop producer / compile ==
[info] Source analysis: 4 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Compilation successful.
[info] Post-analysis: 5 classes.
[info] == hadoop producer / compile ==
[info]
[info] == hadoop producer / package ==
[info] Packaging ./contrib/hadoop-producer/target/scala_2.8.0/hadoop-producer_2.8.0-0.7.jar ...
[info] Packaging complete.
[info] == hadoop producer / package ==
[info]
[info] == hadoop consumer / package ==
[warn] No Main-Class attribute will be added automatically added:
[warn] Multiple classes with a main method were detected. Specify main class explicitly with:
[warn] override def mainClass = Some("className")
[info] Packaging ./contrib/hadoop-consumer/target/scala_2.8.0/hadoop-consumer_2.8.0-0.7.jar ...
[info] Packaging complete.
[info] == hadoop consumer / package ==
[info]
[info] == perf / package ==
[info] Packaging ./perf/target/scala_2.8.0/kafka-perf-0.7.jar ...
[info] Packaging complete.
[info] == perf / package ==
[info]
[info] == java-examples / package ==
[warn] No Main-Class attribute will be added automatically added:
[warn] Multiple classes with a main method were detected. Specify main class explicitly with:
[warn] override def mainClass = Some("className")
[info] Packaging ./examples/target/scala_2.8.0/kafka-java-examples-0.7.jar ...
[info] Packaging complete.
[info] == java-examples / package ==
[info]
[info] == Kafka / package ==
[info] == Kafka / package ==
[success] Successful.
[info]
[info] Total time: 55 s, completed Sep 19, 2011 10:22:38 AM
[info]
[info] Total session time: 56 s, completed Sep 19, 2011 10:22:38 AM
[success] Build completed successfully.
There are some [warn] and [error] lines but at least the build says it completed successfully, so that should be good enough for now. On to step 2
Step 2
Didn’t have much problem here besides finding the hadoop-setup.sh script. Just use the locate command if you’re lazy like me and don’t want to lookup every directory manually until you find the file
Step 3
Step 3 caused me some more troubles. Let’s review each sub step:
3.1 was okay since I had already followed the quick start guide far enough to have a server running and a simple producer and consumer.
In 3.2, I did not change any of the settings in test.properties since they seemed fine by default. The only one that would seem worth changing might be kafka.server.uri but since for now I am running the Kafka server on the same box where I’m trying to run the hadoop-consumer, the local address should be fine. I did create the HDFS directory mentioned in the input property though (not sure if it would have been created anyway by the DataGenerator program).
EDIT (September 29th 2011)
It turned out, later on, while I was doing Step 4, that not changing the default settings was a mistake. In particular, the kafka.server.uri property must be set to an address that is reachable from the outside.
In my case, I was running the DataGenerator program from the same box where the Kafka broker was running, so the DataGenerator WAS able to reach the broker and so that step seemed to be successful. The problem is the offset file (named “1.dat”) that the DataGenerator program writes in HDFS (in a directory determined by the input property) contains the kafka.server.uri inside it and this is what the (Map/Reduce) hadoop-consumer uses to connect from the Hadoop cluster to the broker in order to import the events. Since my Hadoop nodes were on other machines than the kafka.broker.uri machine, they were never able to reach it and were instead throwing:
java.io.IOException: java.net.ConnectException: Connection refused
Basically, I guess the only situation where it would work to have localhost in the kafka.server.uri property is if you are literally running EVERYTHING in the same box (the kafka broker(s), the DataGenerator program and a single-machine or pseudo-distributed Hadoop cluster).
Step 3.3 is where I’ve had a problem:
root@my-server~/kafka-hadoop-consumer/kafka/contrib/hadoop-consumer# ./run-class.sh kafka.etl.impl.DataGenerator test/test.properties
:./../../core/target/scala_2.8.0/kafka-0.7.jar:./../../contrib/hadoop-consumer/lib_managed/scala_2.8.0/compile/commons-codec-1.2.jar:./../../contrib/hadoop-consumer/lib_managed/scala_2.8.0/compile/commons-httpclient-3.1.jar:./../../contrib/hadoop-consumer/lib_managed/scala_2.8.0/compile/commons-logging-1.0.4.jar:./../../contrib/hadoop-consumer/lib_managed/scala_2.8.0/compile/joda-time-1.6.jar:./../../contrib/hadoop-consumer/lib_managed/scala_2.8.0/compile/jopt-simple-3.2.jar:./../../contrib/hadoop-consumer/lib_managed/scala_2.8.0/compile/log4j-1.2.15.jar:./../../contrib/hadoop-consumer/target/scala_2.8.0/hadoop-consumer_2.8.0-0.7.jar:./../../contrib/hadoop-consumer/lib/avro-1.4.0.jar:./../../contrib/hadoop-consumer/lib/commons-logging-1.0.4.jar:./../../contrib/hadoop-consumer/lib/hadoop-0.20.2-core.jar:./../../contrib/hadoop-consumer/lib/jackson-core-asl-1.5.5.jar:./../../contrib/hadoop-consumer/lib/jackson-mapper-asl-1.5.5.jar:./../../contrib/hadoop-consumer/lib/pig-0.8.0-core.jar:./../../contrib/hadoop-consumer/lib/piggybank.jar:./../../project/boot/scala-2.8.0/lib/scala-library.jar
topics=SimpleTestEvent
server uri:tcp://localhost:9092
send 1000 SimpleTestEvent count events to tcp://localhost:9092
11/09/19 14:42:44 INFO producer.SyncProducer: Connected to localhost:9092 for producing
11/09/19 14:42:45 INFO producer.SyncProducer: Disconnecting from localhost:9092
Exception in thread "main" java.io.IOException: Call to my-namenode/10.10.9.1:8020 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at kafka.etl.impl.DataGenerator.generateOffsets(DataGenerator.java:115)
at kafka.etl.impl.DataGenerator.run(DataGenerator.java:107)
at kafka.etl.impl.DataGenerator.main(DataGenerator.java:138)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
Again, I found the answer in a post on the old (pre-Apache incubation) Kafka mailing list. The DataGenerator class depended on the hadoop-core jar located in contrib/hadoop-consumer/lib. My Hadoop cluster is running cdh3-u1, and it seems the hadoop jars included in Kafka are a different version.
I moved the lib/hadoop-0.20.2-core.jar out of that directory and instead I created a symbolic link to the same hadoop jars that my cluster uses.
cd lib
ln -s /usr/lib/hadoop/hadoop-core.jar hadoop-core.jar
After that, I’m not sure if it was necessary, but I re-ran ./sbt package and I was then able to run the DataGenerator as described in step 3.3
Step 4
Step 4 is still in progress! Here’s what I did so far:
In step 4.1, at first, I didn’t change any of the default properties in test.properties.
In step 4.2, the copy-jars.sh script does not look up the hdfs.default.classpath.dir in any property file, so you have to pass it in instead of the ${hdfs.default.classpath.dir} parameter:
./copy-jars.sh /tmp/kafka/lib
Step 4.3 is giving me some headaches. I’ve been changing some of the test.properties from step 4.1 as well as some HDFS user permission settings in order to try to make it work, and I think I’ve made some progress, but it’s still not fully working.
Once I do get it working completely, I’ll post the details in a follow up article. Sorry for the suspense
!
EDIT (September 29th 2011)
I got back to this the other day and finally got it to work
!
First, I had a problem in my properties, which I’ve documented in another EDIT in Step 3. Then, I got a problem where the hadoop-consumer M/R job WAS able to connect to the Kafka broker server and import some events, but it was looping infinitely and re-importing the same events over and over!
It turns out this is a known and fixed bug, KAFKA-131. The latest pre-packaged stable release they provide is 0.6, but it contains this bug.
I checked out the trunk, packaged it using all the other instructions I documented earlier (including changing the hadoop jars for those I’m using!), I then restarted all the servers (ZK, the broker, etc) and everything worked smoothly
!
Also, this post is useful in understanding what the hadoop-consumer does and what it expects you to do for him. I’m now working on a simple script to automate incremental imports so that we don’t have to do these other operations manually and just have it executed automatically every hour or whatever interval
…
Hopefully, this was helpful for you
! Kafka looks great, I can’t wait to do more with it
!