How to create a custom parcels in cloudera cluster.

Install python3 as parcel in cloudera cluster.

Many of you might have come across a scenario where you wanted to install a package with a consistent version across all nodes in cloudera cluster, for example, if you want a higher python version on all nodes in the cluster what would you do...

It's hard to manage versions of such packages if you have python 2.7 on one node and python3 on another node which would lead to an issue with spark/yarn jobs. In this case, parcels come as a rescue, what if we create a parcel to install/manage python3 on nodes in the cluster, this article guides with the same.

Installing and configuring Miniconda

1. Download miniconda: https://docs.conda.io/en/latest/miniconda.html

2. Install miniconda in /usr/local/miniconda3:

bash Miniconda3-latest-Linux-x86_64.sh

3. Go to the directory:

cd /usr/local/miniconda3

4. Install the wanted python version

bin/conda install python=3.6.10

5. Check python version:

bin/python --version

6. At this point you can install other required python libraries

Prepare the environment to build the parcel

1a. Install build environments:

Install git: yum install -y git

b. Install Java JDK:

yum install -y java-1.8.0-openjdk

c. Install Maven 3:

yum install -y maven

2. Install cm_ext tools:

a. git clone https://github.com/cloudera/cm_ext.git

b. cd cm_ext/validator

c. mvn package

Creating the Conda Custom Parcel

1. Create a parcel directory and subsequent meta directory as below:

mkdir -p /usr/local/parcels/COND_PYTHON-3.6.10-0

cd /usr/local/parcels/CONDA_PYTHON-3.6.10-0

mkdir meta

(cd /usr/local/miniconda3 && tar cpf - .) | tar xpf -

1. Create a meta/parcel.json file:

{

"schema_version": 1,

"name": “PYTHON_CONDA",

"version": “1”,

"setActiveSymlink": true,

"depends": "",

"replaces": "",

"conflicts": "",

"provides": [

],

"scripts": {

"defines": “python_conda_env.sh"

},

"packages": [

],

"components": [

{ "name" : "miniconda3",

"version" : "4.10.3",

"pkg_version": "4.10.3",

"pkg_release": "4.10.3"

},

{ "name" : "python",

"version" : "3.6.10",

"pkg_version": "3.6.10",

"pkg_release": "3.6.10"

}

],

"users": {

"spark": {

"longname" : "Spark",

"home" : "/var/lib/spark",

"shell" : "/usr/sbin/nologin",

"extra_groups": [ ]

}

},

"groups": []

}

3. Create a meta/python_conda_env.sh file: ( below 2 lines should be in the file)

#!/bin/sh

exit 0

4. Validate everything:

a. java -jar /usr/local/cm_ext/validator/target/validator.jar -p /usr/local/ parcels/CONDA_PYTHON-3.6.10-0/meta/parcel.json

b. java -jar /usr/local/cm_ext/validator/target/validator.jar -d /usr/local/ parcels/CONDA_PYTHON-3.6.10-0/

5. Package the parcel:

a. cd /usr/local/parcels

b. tar zcf /usr/local/parcels/CONDA_PYHTON-3.6.10-0-el7.parcel CONDA_PYTHON-3.6.10-0 --owner=root --group=root

c. java -jar /usr/local/cm_ext/validator/target/validator.jar -f /usr/local/ parcels/CONDA_PYTHON-3.6.10-0-el7.parcel

6. Sign the parcel:

sha1sum < CONDA_PYTHON-3.6.10-0-el7.parcel | cut -d '' -f 1 > CONDA_PYTHON-3.6.10-0-el7.parcel.sha

7. Copy parcels to the /opt/cloudera/parcel-repo dir in the CM Node

8. Change the permissions:

sudo chown cloudera-scm: /opt/cloudera/parcel-repo/CONDA_PYTHON-3.6.10-0-el7.parcel*

9. Go to CM(Cloudera Manager) > Parcels and click on Check for New Parcels

10. After the parcels is detected, click on Distribute and Activate

11. Check python version in all nodes:

/opt/cloudera/parcels/CONDA_PYTHON/bin/python --version