How to Install Hortonworks Data Platform – HDP 2.6 on AWS

In this post, we will show you how you can install Hortonworks Data Platform on AWS.

You can also watch the video of this tutorial here

 

We start with three machines. We could install Hadoop on these machines by manually downloading and configuring them, but that’s very insufficient. So either we could use Cloudera manager or Ambari. In this tutorial, we are going to use Ambari.

On the first machine, we are going to install the Ambari server. For that, we need to buy these three instances at Amazon and we will follow the Ambari guidelines.

Ambari will then install all the components that are required in other two machines.

Please note, we will use 16 GB ram machines so that installation goes smoothly. 

Let’s get started.

Step 1 – Launch 3 instances of t2.xlarge type

AWS gives us these various configurations. The one which we are going with is 16 GB RAM which is t2.xlarge. You can see how AWS console will look like

As you can see in the above image, we have selected the centos 7. Now the next step is to select the instance type. As stated earlier, we are going with t2.xlarge instance type.

In the next step, we will add storage of 100 GB. Please make sure that you select Magnetic volume type because SSD will cost more and provide less storage. For a reference, HDFS will consume 3 GB storage in order to give you 1 GB of storage, therefore, it makes more sense economically to go with the magnetic value type.

In the next step, we give name to the server.

After you give a name to the server, the next step is to create a security group. Here, we are allowing all the ports so that there is no restriction.

In the next step, you need to create a new key pair.

Amazon provides you a feature of the private & public key. It takes away the headache of login every time on each machine. It will generate a private-public key and give you the private one and it will save the public key on all the machines. This will allow you to connect to your instance securely and easily.

You then download the key pair for this purpose and save it in your home directory later. And you use this to log in to your cluster.

As you can see below, we have successfully initialized the three instances.

Step 2 – Change permission of the downloaded key so that nobody else can access it

chmod 400 hadoop-hdp-demo.pem (Here you are not allowing anyone to read or view)

Step 3 – Give the name to servers

Name one of the nodes to “hadoop-ambari-server” and others two to “hadoop-data-node”

Step 4 – Login to each of the node using the downloaded private key

In this step, you need to provide the private key and your public IP address. For example, ssh -i ~/hadoop-demo.pem centos@54.236.213.53

You can locate the Public IP address on AWS server as shown in the image below

Step 5 – Run “sudo yum update” to update the packages

It will update the packages on all the machines because the instances that have been given by Amazon are a bit old, thus we need to update the software on these machines. Yum is the package manager on Red Head (Centos machine). You will run this command on all of the machines.

Step 6 – Make centos sudoers on all machines

Sudoer is somebody who can do administrative tasks, therefore, we make centos the sudoer.

With these commands, centos will be able to run command anywhere without any restriction.

Step 7 – Now on each machine, verify if the hostname is properly set

With the above command, it will give the full details of the hostname.

Step 8 – Edit the Network Configuration File

In this step, we are setting up the hostname properly.

Please make sure to replace <fqdn> with a proper hostname. You can get this detail from the command used in step 7 i.e hostname -f

Step 9 – Configuring IP tables

Here we are disabling the firewall. It is always safer to disable the firewall when you install or configure services.

Step 10 – sudo yum install unzip

Step 11 – sudo yum install wget

Step 12 – Disable SELinux

Save the file.

We have to reboot the machine.

Add umask 0022 in the last ## An umask value of 022 grants read, write, execute permissions of 755 for new files or folders

Step 13 – Install NTP

So that NTP will be up during reboot

Step 14 – Setup passwordless access for Ambari

Copy the private key to the node where we will install ambari server

Now see if you are able to login to other two nodes from ambari server node

Step 15 – install mysql server on the last datanode

(set root password, disallow remote root login, remove test database and anonymous users)

Step 16 – Create databases for oozie

Step 17 – Create a database for hive

Step 18 – Create a database for ranger

Step 19 – Download and check mysql connector on ambari server host

Step 20 – Install java on each machine apart from ambari server

Step 21 – install JCE on all hosts apart from ambari server

 http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html

Step 22 – Install Ambari server

Summary

We launched 3 nodes with CentOS in AWS with at least 16 GB of RAM and 100 GB of Magnetic hard disk. We then installed Ambari 2.6 on one of the nodes. Finally, we opened all the ports for demo purpose.