VMware vSphere Big Data Extensions Installation(Serengeti)
About VMware vSphere Big Data Extensions
VMware vSphere Big Data Extensions lets you deploy and centrally operate big data clusters running on VMware vSphere.
Big Data Extensions simplifies the Hadoop and HBase deployment and provisioning process, and gives you a real time view of the running services and the status of their virtual hosts.
It provides a central place from which to manage and monitor your big data cluster, and incorporates a full range of tools to help you optimize cluster performance and utilization.
System Requirements for Big Data Extensions
Before you begin the Big Data Extensions deployment tasks, your system must meet all of the prerequisites for vSphere, clusters, networks, storage, hardware, and licensing. Big Data Extensions requires that you install and configure vSphere and that your environment meets minimum resource requirements. Make sure that you have licenses for the VMware components of your deployment.
Before you install Big Data Extensions, set up the following VMware products.
- Install vSphere 5.5 (or later) Enterprise or Enterprise Plus.
- When you install Big Data Extensions on vSphere 5.5 or later, use VMware® vCenter™ Single Sign-On to provide user authentication. When logging in to vSphere 5.5 or later you pass authentication to the vCenter Single Sign-On server, which you can configure with multiple identity sources such as Active Directory and OpenLDAP. On successful authentication, your user name and password is exchanged for a security token that is used to access vSphere components such as Big Data Extensions
- Configure all ESXi hosts to use the same Network Time Protocol (NTP) server
- On each ESXi host, add the NTP server to the host configuration, and from the host configuration's Startup Policy list, select Start and stop with host. The NTP daemon ensures that time-dependent processes occur in sync across hosts.
Configure your cluster with the following settings.
- Enable vSphere HA and VMware vSphere® Distributed Resource Scheduler.
- Enable Host Monitoring.
- Enable admission control and set the policy you want. The default policy is to tolerate one host failure.
- Set the virtual machine restart priority to high.
- Set the virtual machine monitoring to virtual machine and application monitoring.
- Set the monitoring sensitivity to high.
- Enable vMotion and Fault Tolerance logging.
- All hosts in the cluster have Hardware VT enabled in the BIOS.
- The Management Network VMkernel Port has vMotion and Fault Tolerance logging enabled.
Big Data Extensions can deploy clusters on a single network or use multiple networks. The environment determines how port groups that are attached to NICs are configured and which network backs each port group.
You can use either a vSwitch or vSphere Distributed Switch (vDS) to provide the port group backing a Serengeti cluster. vDS acts as a single virtual switch across all attached hosts while a vSwitch is per-host and requires the port group to be configured manually.
When you configure your networks to use with Big Data Extensions, verify that the following ports are open as listening ports
- Ports 8080 and 8443 are used by the Big Data Extensions plug-in user interface and the Serengeti Command-Line Interface Client.
- Port 5480 is used by vCenter Single Sign-On for monitoring and management.
- Port 22 is used by SSH clients.
- To prevent having to open a network firewall port to access Hadoop services, log into the Hadoop client node, and from that node you can access your cluster.
- To connect to the internet (for example, to create an internal yum repository from which to install Hadoop distributions), you may use a proxy.
- To enable communications, be sure that firewalls and web filters do not block the Serengeti Management Server or other Serengeti nodes.
Direct Attached Storage
Attach and configure direct attached storage on the physical controller to present each disk separately to the operating system. This configuration is commonly described as Just A Bunch Of Disks (JBOD). Create VMFS data stores on direct attached storage using the following disk drive recommendations.
- 8-12 disk drives per host. The more disk drives per host, the better the performance.
- 1-1.5 disk drives per processor core.
- 7,200 RPM disk Serial ATA disk drives.
Do not use Big Data Extensions in conjunction with vSphere Storage DRS
Big Data Extensions places virtual machines on hosts according to available resources, Hadoop best practices, and user defined placement policies prior to creating virtual machines. For this reason, you should not deploy Big Data Extensions on vSphereenvironments in combination with Storage DRS. Storage DRS continuously balances storage space usage and storage I/O load to meet application service levels in specific environments. If Storage DRS is used with Big Data Extensions, it will disrupt the placement policies of your Big Data cluster virtual machines.
Migrating virtual machines in vCenter Server may disrupt the virtual machine placement policy
Big Data Extensions places virtual machines based on available resources, Hadoop best practices, and user defined placement policies that you specify. For this reason, DRS are disabled on all the virtual machines created within the Big Data Extensions environment. While this prevents virtual machines from being automatically migrated by vSphere, it does not prevent you from inadvertently moving virtual machines using the vCenter Server user interface. This may break the Big Data Extensions defined placement policy. For example, this may disrupt the number of instances per host and group associations.
Resource Requirements for the vSphere Management Server and Templates
- Resource pool with at least 27.5GB RAM.
- 40GB or more (recommended) disk space for the management server and Hadoop template virtual disks.
Resource Requirements for the Hadoop Cluster
- Datastore free space is not less than the total size needed by the Hadoop cluster, plus swap disks for each Hadoop node that is equal to the memory size requested.
- Network configured across all relevant ESXi hosts, and has connectivity with the network in use by the management server.
- vSphere HA is enabled for the master node if vSphere HA protection is needed. To use vSphere HA or vSphere FT to protect the Hadoop master node, you must use shared storage.
Hardware Requirements for the vSphere and Big Data Extensions Environment
Host hardware is listed in the VMware Compatibility Guide. To run at optimal performance, install your vSphere and Big Data Extensions environment on the following hardware.
- Dual Quad-core CPUs or greater that have Hyper-Threading enabled. If you can estimate your computing workload, consider using a more powerful CPU.
- Use High Availability (HA) and dual power supplies for the master node's host machine.
- 4-8 GBs of memory for each processor core, with 6% overhead for virtualization.
- Use a 1GB Ethernet interface or greater to provide adequate network bandwidth.
Tested Host and Virtual Machine Support
The maximum host and virtual machine support that has been confirmed to successfully run with Big Data Extensions is 256 physical hosts running a total of 512 virtual machines.
You must use a vSphere Enterprise license or above to use VMware vSphere HA and vSphere DRS.
- What's New in vSphere Big Data Extensions 2.1 Big Data Extensions enables the rapid deployment of Hadoop clusters on a VMware vSphere virtual platform.
- Integration with Partner Hadoop Management Tools. Through the Big Data Extensions GUI or CLI, you can seamlessly create virtual machines for a Hadoop cluster and use a Cloudera Manager or Apache Ambari application manager to perform the software installation and cluster management.
- HBase Only Clusters. You can create an HBase only cluster that points to an existing Apache-based Hadoop HDFS or EMC Isilon OneFS.
- Compute Workers Only Cluster. You can create a compute workers only cluster that contains only MapReduce compute workers nodes, for example TaskTracker or NodeManager, and add them into either an existing physical or virtual Hadoop cluster.
- Updated Support for the Latest Hadoop Distributions. In addition to the previously supported Hadoop distributions, Big Data Extensions also supports Apache Bigtop 0.8.0, Cloudera CDH 5.1.3, Hortonworks HDP 1.3.8, and Pivotal PHD 2.0.1 and 2.1.
- Big Data Extensions Upgrade. You can upgrade Big Data Extensions 2.0 to the current version, Big Data Extensions 2.1, and preserve all the data for the clusters created in Big Data Extensions. All of your existing clusters can be managed by Big Data Extensions once the upgrade completes.
Download the .OVA file from Vmware website:
Download Big data extension 2.1.0
Once downloaded import the OVA file from VSphere login
Name the Serengeti Server
Select the HDD capacity for the deployment
Need to provide static IP for Serengeti server
Vmware Serengeti Server is deployed successfully here.
Remember: to update SSO from the BDE Serengeti Management Server, open a browser and visit http://
Once the vCenter BDE plug-in has been registered, you’ll need to log out and log in to the vCenter Web Client. At this point you should be able to see the Big Data Extensions plug-in listed in the left frame of the Web Client.
Select “Big Data Extensions” and then select the link “Register Serengeti Server.” A window will open within which you can navigate the list of available VMs until you find the Serengeti Management Server. Highlight the target server and select “Test Connection.”