PIG INTERVIEW QUESTIONS

What is PIG? PIG is a platform for analyzing large data sets that consist of high level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. PIG’s infrastructure layer consists of a compiler that produces sequence of MapReduce Programs. What is the difference between logical and physical plans? Pig undergoes some … Continue reading PIG INTERVIEW QUESTIONS →

HDFS Commands

Print the Hadoop version$ hadoop versionPrint the java version$ java -versionList the contents of the root directory in HDFS$ hadoop dfs -ls /Report the amount of space used and available on currently mounted filesystem$hadoop dfs -df hdfs:/Count the number of directories, files and bytes under the paths that match the specified file pattern$ hadoop dfs … Continue reading HDFS Commands →

Query to find out stuck query in MSSQL

Select r.*, t.text from sys.dm_exec_requests r outer apply sys.dm_exec_sql_text(sql_handle) t where start_time < getdate() and start_time > getdate()-(1.0/24) and text like ‘%service_segment_ms%’;Here {getdate()} function is used to get the current time. And {getdate()-(1.0/24)} is used to get the past 1 hour data.Table name is present in the column ‘text’. If you want, you can give … Continue reading Query to find out stuck query in MSSQL →

LINUX COMMANDS

Basic Linux Commands File Commands ls : Directory & File listingls -al : Lists hidden filescd dirname : change directory to dircd : change to homepwd : show current directorymkdir dirname : create a directoryrm filename : delete filerm -r dirname : delete directoryrm -f filename : force remove filerm -rf dirname : force remove … Continue reading LINUX COMMANDS →

COMMISSIONING & DECOMMISSIONING OF NODES FROM HADOOP CLUSTER

Introduction:One of the most attractive feature of Hadoop is the utilization of the commodity hardware . Frequent data node crashs in Hadoop Cluster. The ease of SCALING in accordance the rapid growth in data volume. Hence, one of the most common task of a Hadoop Administrator isto COMMISSION & DECOMMISSION Data Nodes & Task tracker … Continue reading COMMISSIONING & DECOMMISSIONING OF NODES FROM HADOOP CLUSTER →

Apache Drill Setup

Complete the following steps to install Drill: In a terminal window, change to the directory where you want to install Drill.To get the latest version of Apache Drill, download Drill from the Drill web site or run one of the following commands, depending on which you have installed on your system: $ wget http://getdrill.org/drill/download/apache-drill-1.3.0.tar.gz Copy the … Continue reading Apache Drill Setup →

Apache Storm Installtion Guide

Note: this installation does not require sudo, logs and other data maintained by ZooKeeper and Storm are in my home folderCreate a directory for storm and enter it mkdir storm cd stormCreate a data directory mkdir -p datadir/zookeeper Download ZooKeeper and unzip it wget http://apache.mirrors.spacedump.net/zookeeper/current/zookeeper-3.4.8.tar.gz (or the appropriate version) tar -xvf zookeeper-3.4.6.tar.gzDownload Storm and unzip … Continue reading Apache Storm Installtion Guide →

Kafka Installation

Download and Extract kafka wget "http://www-eu.apache.org/dist/kafka/1.0.1/kafka_2.12-1.0.1.tgz" -O ~/Downloads/kafka.tgz -O ~/Downloads/kafka.tgz tar -xvzf ~/Downloads/kafka.tgz Configure the Kafka Server vi ~/kafka/config/server.properties By default, Kafka doesn't allow you to delete topics. To be able to delete topics, add the following line at the end of the file: ~/kafka/config/server.properties delete.topic.enable = true Start the Kafka Server nohup ./bin/kafka-server-start.sh config/server.properties … Continue reading Kafka Installation →

Apache Sqoop Performance Tuning Benchmark

Problem 1:Reduce the Time Taken to Perform Sqoop Job. Solution 1: Increase the number of parallel tasks by using an appropriate value for –m parameter. Q: How do you determine the right value for –m ? Ans: Hadoop 2.x (HDI 3.x) uses YARN and each Yarn task is assigned a container which has a memory … Continue reading Apache Sqoop Performance Tuning Benchmark →

Steps to Setup Apache Hadoop and Hive Pseudo Distributed mode cluster setup

1. Setup Hadoop Step 1: Install Java $ sudo apt-get install openjdk-7-jdk check java version $ java -version java version "1.7.0_91" OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.15.04.1) OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode) Step2: Setup your hosts file sudo vi /etc/hosts Change the following ip address to the one obtained in the previous … Continue reading Steps to Setup Apache Hadoop and Hive Pseudo Distributed mode cluster setup →