QueryDB

Posts

Showing posts from October, 2015

Hive : java.sql.SQLException: Access denied for user (using password: YES)

You may face this error with Hive during query execution or Hive installation. This is basically related to MySQL. Caused by: java.sql.SQLException: Access denied for user 'hive'@'hivehost' (using password: YES) This error is caused because of insufficient privliges to your user follow below step to solve this :- mysql> CREATE USER 'hive'@'hivehost' IDENTIFIED BY 'mypassword'; ... mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'hivehost'; mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO 'hive'@'hivehost'; mysql> FLUSH PRIVILEGES; mysql> quit; Note that create user for each host from where you are going to access the database. Refer given url for Hive installtion: http://querydb.blogspot.in/2015/10/hive-installation-step-by-step-with.html

Hive: NullPointerException in collect_set() UDF

While using Hive version less than < 0.14. You may get below exception: Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)

HBASE installation

1) Download tar file "hbase-0.96.2-hadoop2-bin.tar.gz" 2) Unpack this tar on each machine of your HBase installtion. 3) Edit .bashrc export HBASE_HOME=/opt/ds/app/hbase-0.96.2-hadoop2 export PATH=$PATH:$HBASE_HOME/bin 4) Execute >source .bashrc 5) Verify hbase version > hbase version 2015-10-29 13:33:48,002 INFO [main] util.VersionInfo: HBase 0.96.2-hadoop2 6) Edit "hbase-env.sh" Update JAVA_HOME export HBASE_MANAGES_ZK=true export HBASE_PID_DIR=/var/hbase/pids 7) Update "hbase-site.xml" on Master <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://abcdHost:54310/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>hdfs://

Apache OOZIE installation step-by-step on Ubuntu

1) Download "oozie-4.1.0.tar.gz" 2) Gunzip and Untar @ /opt/ds/app/oozie 3) Change directory to /opt/ds/app/oozie/oozie-4.1.0 4) Execute bin/mkdistro.sh -DskipTests -Dhadoopversion=2.2.0 5) Change directory to /opt/ds/app/oozie/oozie-4.1.0/distro/target/oozie-4.1.0-distro/oozie-4.1.0 6) Edit '.bashrc' and add export OOZIE_VERSION=4.1.0 export OOZIE_HOME=/opt/ds/app/oozie/oozie-4.1.0/distro/target/oozie-4.1.0-distro/oozie-4.1.0 export PATH=$PATH:$OOZIE_HOME/bin 7) Change directory to /opt/ds/app/oozie/oozie-4.1.0/distro/target/oozie-4.1.0-distro/oozie-4.1.0 8) Make directory 'libext' 9) Execute: >cp /opt/ds/app/oozie/oozie-4.1.0/hcataloglibs/target/oozie-4.1.0-hcataloglibs.tar.gz . >tar xzvf oozie-4.1.0-hcataloglibs.tar.gz >cp oozie-4.1.0/hadooplibs/hadooplib-2.3.0.oozie-4.1.0/* libext/ >cd libext/ 10) Download 'ext-2.2.zip'and place it in 'libext/' directory 11) Add below properti

Flume: ERROR : HDFSEventSink. process failed

I 'm using Hadoop 2.2 as a sink in Flume 1.4 If you try to use HDFS then you might get below exception:- [ERROR] HDFSEventSink. process failed Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$RecoverLeaseRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) Above Exception is due to incompatiblity of protobuf libraries. So, a possible solution that, I followed is to rename 2 Jar files in /opt/ds/app/flume-1.4.0/lib >protobuf-java-2.4.1.jar-1 >guava-10.0.1.jar-1 Renaming this 2 Jars will cause Flume not to load it and

Flume installation step-by-step

1) Download "apache-flume-1.4.0-bin.tar.gz" 2) Gunzip and Untar the file at /opt/ds/app/flume-1.4.0 3) Change directory to /opt/ds/app/flume-1.4.0/conf 4) Optionally you can edit log directory flume.log.dir=/var/log/flume 5) Edit .bashrc export FLUME_HOME=/opt/ds/app/flume-1.4.0 export FLUME_CONF_DIR=/opt/ds/app/flume-1.4.0/conf export FLUME_CLASSPATH=$FLUME_CONF_DIR export PATH=$PATH:$FLUME_HOME/bin 6) execute from shell > source .bashrc 7) Copy jar to /opt/ds/app/flume-1.4.0/lib hadoop-auth-2.2.0.jar hadoop-common-2.2.0.jar 8) From shell > flume-ng --help

Pig step-by-step installation with integrated HCatalog

1) Download tar file "pig-0.13.0.tar.gz" 2) Gunzip and Untar the file at / opt/ds/app/pig-0.13.0 3) Change directory to /opt/ds/app/pig-0.13.0/conf 4) Create log4j.properties from template file 5) Update pig.properties for HCatalog. For example: hcat.bin=/opt/ds/app/hive-0.13.0/hcatalog/bin/hcat 6) Edit .bashrc export PIG_HOME=/opt/ds/app/pig-0.13.0 export PATH=$PATH:$PIG_HOME/bin export HCAT_HOME=/opt/ds/app/hive-0.13.0/hcatalog export PATH=$PATH:$HCAT_HOME/bin 7) It is assumed that you have already set HADOOP_HOME, JAVA_HOME, HADOOP_COMMON_LIB_NATIVE_DIR, HADOOP_OPTS, YARN_OPTS 8) Optionally, you can create .pigbootup in User home directory 9) Execute command from user home directory > source .bashrc 10) Execute > pig -useHCatalog 11) Say you had created a table in Hive with name "hivetesting". Now, try to load with below command to verify installation. grunt> A = LOAD 'hivetesting' USING org.apache.hcatalog

Hive Installation step- by-step with MySQL Metastore

1) Download hive "apache-hive-0.13.0-bin.tar.gz" 2) Gunzip and Untar at path /opt/ds/app/hive-0.13.0 3) Edit ~/.bashrc and add below lines:- #HIVE export HIVE_HOME=/opt/ds/app/hive-0.13.0 export PATH=$PATH:$HIVE_HOME/bin 4) Change directory to /opt/ds/app/hive-0.13.0/conf 5) Create hive-log4j.properties from template 6) Create hive-env.sh from template. Also,set # if [ "$SERVICE" = "cli" ]; then if [ -z "$DEBUG" ]; then export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit" else export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit" fi fi # The heap size of the jvm stared by hive shell script can be controlled via: # export HADOOP_HEAPSIZE="1024" export HADOOP_CLIENT_OPTS="-Xmx${HADOO

Oozie Coordinator Job Scheduling frequency

Sometimes you may see a situation where-in coordinator job says frequency in minutes for example say 60 but you can see the workflow jobs are running more frequently. On the OOZIE Web console you can see the ‘Created Time’ increments more frequently while ‘Nominal Time’ increments by an hour which is the interval you may want. The issue as here is that start date in coordinator xml is of past. So, in this scenario, Oozie will submit workflows for all the intervals that were missed starting from the start time till it gets in sync with current time. **The nominal time is the actual time interval (hour) that the workflow is supposed to process. So, In such a situation you might want to set Concurrency to decide how many actions to runs in parallel, or Execution strategy, that can be FIFO, LIFO, LAST Only, or Throttle to decide how many jobs can be in waiting status if one is already Running. Example: <controls> <concurrency>1</concurrency> <execu

Oozie hangs on a single node for work flow with fork-Join

For each action a job is launched by oozie:launcher which in turn executes your actual action job. For example you have a flow like :- It actually going to launch 4 MR's as below:- So, at this point of time four Map slots are required (two for MR launchers, two for actual jobs). But, default only two Map slots are available in each node. That is, only two Map tasks will run at any point of time in a node. This is specified by the `mapred.tasktracker.map.tasks.maximum` property which defaults to 2 in the mapred-site.xml file. Available slots are occupied by 2 launchers and So, the 2 Jobs wait for available slots. This happens in pseudo-distributed mode. So may be one can run it over cluster or change " mapred.tasktracker.map.tasks.maximum " property in mapred-site.xml. Note that setting property in pseudo -distribute may not even work. So, one might has to bump Memory and Cores too.