In this post, we will show you how you can install Hortonworks Data Platform on AWS.
You can also watch the video of this tutorial here
We start with three machines. We could install Hadoop on these machines by manually downloading and configuring them, but that’s very insufficient. So either we could use Cloudera manager or Ambari. In this tutorial, we are going to use Ambari.
On the first machine, we are going to install the Ambari server. For that, we need to buy these three instances at Amazon and we will follow the Ambari guidelines.
Ambari will then install all the components that are required in other two machines.
Please note, we will use 16 GB ram machines so that installation goes smoothly.
Let’s get started.
Step 1 – Launch 3 instances of t2.xlarge type
AWS gives us these various configurations. The one which we are going with is 16 GB RAM which is t2.xlarge. You can see how AWS console will look like
As you can see in the above image, we have selected the centos 7. Now the next step is to select the instance type. As stated earlier, we are going with t2.xlarge instance type.
In the next step, we will add storage of 100 GB. Please make sure that you select Magnetic volume type because SSD will cost more and provide less storage. For a reference, HDFS will consume 3 GB storage in order to give you 1 GB of storage, therefore, it makes more sense economically to go with the magnetic value type.
In the next step, we give name to the server.
After you give a name to the server, the next step is to create a security group. Here, we are allowing all the ports so that there is no restriction.
In the next step, you need to create a new key pair.
Amazon provides you a feature of the private & public key. It takes away the headache of login every time on each machine. It will generate a private-public key and give you the private one and it will save the public key on all the machines. This will allow you to connect to your instance securely and easily.
You then download the key pair for this purpose and save it in your home directory later. And you use this to log in to your cluster.
As you can see below, we have successfully initialized the three instances.
Step 2 – Change permission of the downloaded key so that nobody else can access it
chmod 400 hadoop-hdp-demo.pem (Here you are not allowing anyone to read or view)
Step 3 – Give the name to servers
Name one of the nodes to “hadoop-ambari-server” and others two to “hadoop-data-node”
Step 4 – Login to each of the node using the downloaded private key
In this step, you need to provide the private key and your public IP address. For example, ssh -i ~/hadoop-demo.pem centos@54.236.213.53
You can locate the Public IP address on AWS server as shown in the image below
Step 5 – Run “sudo yum update” to update the packages
It will update the packages on all the machines because the instances that have been given by Amazon are a bit old, thus we need to update the software on these machines. Yum is the package manager on Red Head (Centos machine). You will run this command on all of the machines.
Step 6 – Make centos sudoers on all machines
Sudoer is somebody who can do administrative tasks, therefore, we make centos the sudoer.
sudo visudo centos ALL=(ALL) ALL
With these commands, centos will be able to run command anywhere without any restriction.
Step 7 – Now on each machine, verify if the hostname is properly set
hostname -f
With the above command, it will give the full details of the hostname.
Step 8 – Edit the Network Configuration File
In this step, we are setting up the hostname properly.
sudo vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=<fqdn>
Please make sure to replace <fqdn> with a proper hostname. You can get this detail from the command used in step 7 i.e hostname -f
Step 9 – Configuring IP tables
sudo systemctl disable firewalld sudo service firewalld stop
Here we are disabling the firewall. It is always safer to disable the firewall when you install or configure services.
Step 10 – sudo yum install unzip
Step 11 – sudo yum install wget
Step 12 – Disable SELinux
sudo setenforce 0 // For current session sudo vi /etc/selinux/config SELINUX=disabled // Permanent
Save the file.
We have to reboot the machine.
sudo vi /etc/profile
Add umask 0022 in the last ## An umask value of 022 grants read, write, execute permissions of 755 for new files or folders
Step 13 – Install NTP
sudo systemctl disable chronyd.service
sudo yum install -y ntp && sudo systemctl start ntpd && sudo systemctl enable ntpd sudo systemctl disable chronyd.service
So that NTP will be up during reboot
Step 14 – Setup passwordless access for Ambari
Copy the private key to the node where we will install ambari server
scp -i ~/hadoop-demo.pem hadoop-demo.pem centos@54.236.213.53:
Now see if you are able to login to other two nodes from ambari server node
ssh -i hadoop-demo.pem centos@ip-172-31-54-74.ec2.internal ssh -i hadoop-demo.pem centos@ip-172-31-52-125.ec2.internal
Step 15 – install mysql server on the last datanode
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm sudo yum update sudo yum install mysql-server sudo systemctl start mysqld mysql_secure_installation
(set root password, disallow remote root login, remove test database and anonymous users)
Step 16 – Create databases for oozie
mysql -u root -p CREATE USER 'oozie'@'%' IDENTIFIED BY 'oozie123'; GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'%'; FLUSH PRIVILEGES; CREATE DATABASE oozie;
Step 17 – Create a database for hive
mysql -u root -p CREATE USER hive@'%' IDENTIFIED BY 'hive123'; GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%'; FLUSH PRIVILEGES; CREATE DATABASE hive;
Step 18 – Create a database for ranger
mysql -u root -p CREATE USER ranger@'%' IDENTIFIED BY 'ranger123'; GRANT ALL PRIVILEGES ON *.* TO 'ranger'@'%'; FLUSH PRIVILEGES;
Step 19 – Download and check mysql connector on ambari server host
sudo yum install mysql-connector-java* ls -lh /usr/share/java/mysql-connector-java.jar
Step 20 – Install java on each machine apart from ambari server
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.rpm" sudo yum localinstall jdk-8u161-linux-x64.rpm sudo rm jdk-8u161-linux-x64.rpm
Step 21 – install JCE on all hosts apart from ambari server
http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
sudo unzip -o -j -q jce_policy-8.zip -d /usr/java/jdk1.8.0_161/jre/lib/security
Step 22 – Install Ambari server
sudo wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.6.1.3/ambari.repo -O /etc/yum.repos.d/ambari.repo sudo yum repolist sudo yum install ambari-server sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar sudo ambari-server start
Summary
We launched 3 nodes with CentOS in AWS with at least 16 GB of RAM and 100 GB of Magnetic hard disk. We then installed Ambari 2.6 on one of the nodes. Finally, we opened all the ports for demo purpose.