How To Install Hadoop On Windows 10 Using Vmware

By: Jeff Levy

Hadoop Distributed File System Overview

This step-by-step tutorial will walk yous through how to install Hadoop on a Linux Virtual Machine on Windows 10. Even though y'all tin can install Hadoop directly on Windows, I am opting to install Hadoop on Linux considering Hadoop was created on Linux and its routines are native to the Linux platform.

Prerequisites

Apache recommends that a test cluster have the post-obit minimum specifications:

two CPU Cores
8 GB RAM
30 GB Difficult Drive Infinite

Step one: Setting upwards the Virtual Machine

- Download and install the Oracle Virtual Box (Virtual Machine host)
- Download the Linux VM image. In that location is no need to install it yet, just have information technology downloaded. Take note of the download location of the iso file, equally yous will need information technology in a afterward footstep for installation.
  - This tutorial will exist using Ubuntu 18.04.2 LTS. You may cull to use another Linux platform such as RedHat, however, the commands and screenshots used in this tutorial will be relevant to the Ubuntu platform.
  - The Ubuntu iso can be constitute here.
- Now, open up the Oracle VM VirtualBox Manager and select Machine [wp-svg-icons icon="arrow-correct-2″ wrap="i"] New.

Oracle VM VirtualBox Manager

- Choose a Proper noun and Location for your Virtual Machine. Then select the Blazon as 'Linux' and the version as Ubuntu (64-scrap). Select 'Adjacent' to go to the next dialogue.

Create Virtual Machine

- Select the appropriate memory size for your Virtual Motorcar. Be mindful of the minimum specs outlined in the prerequisite section of this article. Click Side by side to go onto the next dialogue.

Create Virtual Machine Memory Size

- Choose the default, which is 'Create a virtual difficult deejay now '. Click the 'Create' button.

Create Virtual Machine Hard Disk

- Choose the VDI Hard disk drive file type and Click 'Next'.

Create Virtual Hard Disk File Type

- Choose Dynamically allocated and Select 'Adjacent'.

Create Virtual Hard Disk dynamic or fixed

- Cull the Hard drive space reserved by the Virtual Machine and hit 'Create'.

Create Virtual Hard Disk File Location and Size

- At this point, your VM should be created! Now go back to the Oracle VM VirtualBox Manager and start the Virtual Machine. You can start your machine past right clicking your new case choosing Start [wp-svg-icons icon="arrow-right-2″ wrap="i"] Normal Start.

VM Created

Later on selecting Outset, you will be prompted to add a Start-up disk. You will demand to navigate on your file system to where you saved your Ubuntu ISO file.

Select StartUp Disk
At this betoken, you will be taken to an Ubuntu installation screen. The process is straightforward and should be self-explanatory. The installation procedure will only take a few minutes. We're getting close to starting upwards our Hadoop instance!

Stride 2: Setup Hadoop

Prerequisite Installations

Next, information technology's necessary to first install some prerequisite software. One time logged into your Linux VM, simply run the post-obit commands in Linux Terminal Window to install the software.

- JAVA: Last Command:

$ sudo apt install openjdk - eleven - jre - headless

- ssh: Terminal Command:

$ sudo apt - get install ssh

- pdsh: Terminal Command:

$ sudo apt - get install pdsh

Article Terminal Command

Download and Unpack Hadoop

At present allow'south download and unpack Hadoop.

- To download Hadoop, enter the post-obit control in the terminal window:

$ wget http : //www.gtlib.gatech.edu/pub/apache/hadoop/common/hadoop-iii.3.0/hadoop-iii.3.0.tar.gz

- To unpack Hadoop, enter the following commands in the concluding window:

$ tar - xvf hadoop - iii.iii.0.tar.gz

$ mv hadoop - iii.three.0 hadoop

$ sudo mv hadoop / / usr / share /

$ export HADOOP_HOME = / usr / share / hadoop

Setting the JAVA_HOME Environment Variable

Navigate to the 'etc/hadoop/hadoop-env.sh' file and open it up in a text editor. Find the 'export JAVA_HOME' argument and replace it with the following line:

export JAVA_HOME = / usr / lib / jvm / java - xi - openjdk - amd64 /

It should look like the moving picture beneath.

Hadoop Env Settings

Standalone Operation

The first style we will be looking at is Local (Standalone) Fashion. This method allows you to run a single JAVA process in not-distributed mode on your local example. It is not run by any Hadoop Daemons or services.

- Navigate to your Hadoop Directory past entering the following command in the terminal window:

- Side by side, run the following command:

The output should look similar to the following:

Hadoop Output

- Next, we volition try running a simple PI estimator programme, which is included in the Hadoop Release. Try running the following command in the Terminal Window:

sudo bin / hadoop jar share / hadoop / mapreduce / hadoop - mapreduce - examples - three.2.0.jar pi 16 1000

The output should look similar to the following:

Hadoop Output 2

Hadoop Output 3

Pseudo-Distributed Operation

Another alternative to Standalone mode is Pseudo-Distributed mode. Under this mode, each Hadoop daemon / service runs a separate Java process.

- Navigate to etc/hadoop/core-site.xml for editing and add the following xml code inside the 'configuration' tags in the core-site.xml file.

fs . defaultFS

hdfs : //localhost:9000

It should await like this:

Output 4

- Navigate to etc/hadoop/hdfs-site.xml for editing and add the following xml lawmaking inside the 'configuration' tags in the hdfs-site.xml file.

- Cheque that you tin can ssh to the localhost without a passphrase. If you are prompted for a countersign, enter the following commands:

$ ssh - keygen - t rsa - P '' - f ~ / . ssh / id _rsa

$ cat ~ / . ssh / id_rsa . pub >> ~ / . ssh / authorized _keys

$ chmod 0600 ~ / . ssh / authorized_keys

Starting the NameNode and DataNodes

- The showtime affair you want to do before executing on the pseudo-distributed mode is to format the filesystem. Execute the following command in your HADOOP_HOME directory:

$ bin / hdfs namenode - format

- Next, start the NameNode and the DataNode daemon / services by entering the post-obit command:

Afterward starting the instance, go to http://localhost:9870 on your favorite browser. The post-obit screen should appear:

NameNode

When navigating to the Datanode tab, nosotros see that we accept 1 node.

DataNode Info
In addition to the nodes, you can see "Scan Directory."

Browsing Directory
In Role two of this Commodity, I will swoop deeper into the functionality of the NameNode and DataNode(s) as well as show how to ingest data into the Hadoop ecosystem.

References:

https://hadoop.apache.org/docs/stable/hadoop-projection-dist/hadoop-common/SingleCluster.html#Installing_Software
Sams Teach Yourself Hadoop in 24 Hours past Jeffrey Aven, 2017 at Amazon

Questions?

Thanks for reading! We hope you lot plant this blog post to be useful. Do let u.s.a. know if y'all take any questions or topic ideas related to BI, analytics, the cloud, machine learning, SQL Server, (Star Wars), or anything else of the like that you'd like united states of america to write nearly. Simply get out us a comment below, and we'll see what we tin practise!

Go on your data analytics precipitous by subscribing to our mailing list

Become fresh Key2 content around Concern Intelligence, Information Warehousing, Analytics, and more delivered right to your inbox!

Key2 Consulting is a data warehousing and concern intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.

Source: https://key2consulting.com/install-hadoop-on-linux-virtual-machine-on-windows-10/

Posted by: fullerseethe79.blogspot.com