Open Stack Parasol Installation

From genomewiki
Revision as of 17:35, 28 June 2018 by Hiram (talk | contribs) (add category tags)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Prerequisites

This discussion assumes you are familiar with Unix shell command line programming and scripting. You will be encountering and interacting with csh/tcsh, bash, perl, and python scripting languages.

This entire discussion assumes the bash shell is the user's unix shell.

You already have your Open Stack credentials installed and you can use the command line Open Stack functions: CLI OpenStack command line clients

Procedure

The steps described in this procedure perform the installation of the Parasol_job_control_system on the Open Stack cloud platform. These procedures could be adapted for use in other types of cloud computing platforms.

Parasol Hub

The parasol hub machine instance needs to start up first.

There are two scripts used for this procedure: startParaHub.sh and paraHubSetup.sh. Currently obtained from the genomewiki:


 wget -qO startParaHub.sh 'http://genomewiki.ucsc.edu/images/d/df/OpenStackStartParaHub.sh.txt'
 chmod 755 startParaHub.sh
 wget -qO paraHubSetup.sh 'http://genomewiki.ucsc.edu/images/a/a0/OpenStackParaHubSetup.sh.txt'
 # paraHubSetup.sh does not need execute permissions here

Verify your Open Stack login credentials are functioning and your network connection is valid to your Open Stack system:

openstack server list

That command should not fail. Now make sure your ssh keys are setup and valid:

ssh-keygen -t rsa

Upload the id_rsa.pub file into the 'Key Pairs' section of the Access and Security section on your openstack system.

Now the startParaHub.sh command can be run:

./startParaHub.sh 0

The 0 argument is merely an identifier marker to make this hub machine name unique. There is no reason to run more than one parasol hub machine, but if you do, give them different number identifiers to keep them separate.

What this script does:

  • names the machine instance: yourNameHub_0
  • starts the machine with the openstack server create command
  • the paraHubSetup.sh script is passed to the machine via this create command
  • sleeps 280 seconds to allow the machine time to start up
  • attaches a floating ip address to the machine for external to LAN access
  • polls, by ssh, the machine to wait for a completion signal file from the paraHubSetup.sh process
  • creates ssh keys on the machine for the standard user login
  • notifies the parasol configuration this machine can be a node resource minus two CPUs
  • records results of these procedures in a log file in directory: ./logs/

The paraHubSetup.sh script on the machine instance is running during this first boot up is performing the following procedures on the machine instance:

  • establish a /data/ directory hierarchy
  • NFS exports this /data/ directory to the local network for use by the node machines
  • install parasol management scripts in /data/parasol/
  • install system software for commands used by this business using yum install
  • install kent command line programs and scripts to be used in tool chain processing in /data/bin/ and /data/scripts/
  • signals completed processing in the file: /tmp/hub.machine.ready.signal
  • has recorded these processing steps in a log file on the machine: /tmp/startUpScript.log.$$

When problems arise, the recorded log files can be examined to find clues of what went haywire.

Parasol nodes

After the parasol hub machine has started, note its internal IP address from either your WEB Open Stack control site, or from the command line query:

openstack server list
genomebrowser-net=10.109.0.122, 10.50.103.98 | CentOS-7.1-x86_64 | z1.medium |
   | 53593c73-ab46-4e2a-825e-75b91f753da9 | hiramHub_0

In this example it is the 10.109.0.122 address. This address is used in the startParaNode.sh script. It directs the node machine to the resources provided by the hub machine. The other address 10.50.103.98 is the floating ip address to provide external access from the Open Stack LAN.

There are two scripts used for this procedure: startParaNode.sh and paraNodeSetup.sh.template. Currently obtained from the genomewiki:


 wget -qO startParaNode.sh 'http://genomewiki.ucsc.edu/images/5/59/OpenStackStartParaNode.sh.txt'
 chmod 755 startParaNode.sh
 wget -qO paraNodeSetup.sh.template 'http://genomewiki.ucsc.edu/images/c/c6/OpenStackParaNodeSetup.sh.template.txt'
 # paraNodeSetup.sh.template does not need execute permissions here

Verify your Open Stack connection is valid:

openstack server list

That should show your parasol hub machine running.

Run the script with the IP address argument and a unique number identifier:

./startParaNode.sh 10.109.0.122 0

Each machine instance will require a unique number identifier to keep their names separate.

The procedures performed by this command:

  • names the machine instance: yourNameNode_0
  • sed edits the paraNodeSetup.sh.template to paraNodeSetup.sh to insert the specified IP address
  • starts the machine instance with openstack server create command
  • waits 280 seconds to allow the machine time to start up
  • adds a floating ip address to the machine for access outside the Open Stack LAN
  • waits for the /tmp/node.machine.ready.signal to appear in the machine instance
  • copies the /data/parasol/nodeInfo/id_rsa ssh key to ~/.ssh/id_rsa to allow localhost ssh
  • runs command on the machine /data/parasol/nodeInfo/nodeReport.sh 0 to report this node to the parasol hub

The paraNodeSetup.sh script sent to the machine by the openstack server create command performs the following procedures as the machine starts:

  • installs system software with yum install for networking, perl, wget, vim and bc commands
  • mounts the NFS filesystem /data/ from the parasol hub machine at the specified IP address
  • sets completion signal file /tmp/node.machine.ready.signal

To start several machines at the same time:

 for N in 0 1 2 3 4 5
 do
   ./startParaNode.sh 10.109.0.122 ${N} &
 done
 wait

Start parasol system

After the parasol hub and node machines have successfully started, login to the parasol hub machine and run the commands:

 cd /data/parasol
 ./initParasol initialize       # verifies all ssh keys are present and function OK
 ./initParasol start            # this starts the parasol system
 parasol status                 # to see status of system

The system is now ready to run the tool chain processing procedures.