Metadata JSON

Each App Store image includes metadata contained inside a JSON file that specifies various interface and configuration options. Some of this metadata is visible in the App Store screen and/or the Create Cluster and Create New Job screens, and is described in the Interface Metadata section of this article. The interface metadata, along with other application configuration metadata, is contained inside the Catalog JSON file that is described in the Catalog JSON File section of this article.

Interface Metadata

The interface-related metadata included for use by the App Store interface consists of the following information:

Catalog JSON File

This article uses the CDH 5.4.3 with Cloudera Manager Catalog entry as an example for explaining the EPIC Catalog (App Store) entry JSON properties. The cdh54CM.json file is located in the /opt/bluedata/catalog/entries/system directory.

Note: This article describes Version 1 of the catalog JSON. This version is still supported; however, you may want to use later versions for authoring new Catalog entries. Version 2 (or later) will be required for any entry that makes use of a later version of the vAgent config API, and Version 3 (or later) will be required if you are supplying a custom logo.

Catalog entry properties can be broadly segregated into the following purposes:

Identification

The identification blob appears as follows:

                        "distro_id": "cdh54CM",
                        "label": {
                          "name": "CDH 5.4.3 with Cloudera Manager",
                          "description": "CDH 5.4.3 with MRv1/YARN and HBase support. Includes Pig, Hive, Hue and Spark."
                          },
                        "version": "2.0.1",
                        "epic_compatible_versions": ["3.4"],
                        "categories": [ "Hadoop", "HBase" ],

In this blob:

Components

The components blob appears as follows:

                        "image": {
                          "checksum": "b07e8cfea8a9c1a6cdc6990b1da29b9f",
                          "import_url": "http://s3.amazonaws.com/bluedata-vmimages/Cloudera-CDH-CM-5.4.3-v2.tgz"
                        },
                        "setup_package": {
                          "checksum": "7560c8841c1400e0e4a4ba3dac1ba8d7",
                          "import_url": "http://s3.amazonaws.com/bluedata-vmimages/cdh5-cm-setup.tgz"
                        },

In this blob:

Services

The services blob appears as follows:

                        "services": [
                          {
                            "id": "hbase_master",
                            "exported_service": "hbase",
                            "label": {
                              "name": "HMaster"
                              },
                            "endpoint" : {
                              "url_scheme" : "http",
                              "port" : "60010",
                              "path" : "/",
                              "is_dashboard" : true
                              }
                            },
                            {
                              "id": "hbase_worker",
                              "label": {
                                "name": "HRegionServer"
                                },
                              "endpoint" : {
                                "url_scheme" : "http",
                                "port" : "60030",
                                "path" : "/",
                                "is_dashboard" : true
                              }
                              },
                            {
                              "id": "hbase_thrift",
                              "label": {
                                "name": "HBase Thrift service."
                                }
                              },
                            ...
                        ],

In this example, services is a list of service objects. The defined services will be referenced by other elements of this JSON file to determine which services are active on which nodes within the cluster. That information will then be used to:

Setup scripts also use service identifiers to register those services with vAgent, so that necessary services can be properly started and restarted along with the virtual node. Setup scripts can also choose to wait for a vAgent-registered service to be active on a node in order to coordinate multi-node setup across the cluster.

Note: The "service" terminology does not correspond to a definition of "service" that is specific to some particular application or application framework. A "service" is any entity that can be used for any of the purposes described above. For example, a YARN resource manager is a service, as is sshd.

In this blob:

Note: The above values are currently only used when determining appropriate Add-On Image entries that can be added to a cluster, because those entries may have a requirement that the cluster provides specific exported services, or even exported services with specific qualifiers. For example, an add-on may have a dependence on the Hadoop exported service, or a more specific dependence on Hadoop with the YARN qualifier.
Note: The presence of an endpoint object triggers the creation of a NAT port mapping for this service, if EPIC is running inside an EC2 instance.

Node Roles

The node_roles blob appears as follows:

                        "node_roles": [
                          {
                            "id": "controller",
                            "cardinality": "1",
                            "anti_affinity_group_id": "CM",
                            "min_cores": "4",
                            "min_memory": "12288"
                          },
                          {
                            "id": "standby",
                            "cardinality": "1",
                            "anti_affinity_group_id": "CM"
                          },
                          {
                            "id": "arbiter",
                            "cardinality": "1",
                            "anti_affinity_group_id": "CM"
                          },
                          {
                            "id": "worker",
                            "cardinality": "1+"
                          }
                        ],

In this example, node_roles is a list of objects describing roles that may be deployed for this Catalog entry. Each role is a particular configuration instantiated from the entry's virtual node image and configured by the setup scripts. The configuration associated with a particular role is broadly left up to the setup scripts, and thus varies widely from entry to entry; however, there are certain constraints and semantics associated with specific roles in the current EPIC release (for non-Add-On entries):

The properties of each role object are:

Anti-affinity is typically used to reduce the physical resources shared by a set of nodes, to make it less likely for a single physical fault to affect them all. This constraint only applies to nodes within a given cluster; anti-affinity is not enforced among nodes from different clusters.

Configuration

The configuration blob appears as follows:

                        "config": {
                          "selected_roles": [
                             ...
                            ],
                          "node_services": [
                             ...
                            ],
                          "config_meta": [
                             ...
                            ],
                          "config_choices": [
                             ...
                            ],

The remainder of the JSON file describes which node roles will be deployed into the cluster, and which services will be present on any node with a given role. This information may depend on choices provided by the UI/API user when they are creating the cluster.

This structure means that the top-level selected_roles, node_services, and config_meta property values will apply regardless of any user-provided input about choice selections. User-provided input may then have consequences such as activating additional roles and/or services in the cluster, and/or adding more elements to the config_meta

For example, in the CDH 5.4.3 JSON:

Selected Roles

The selected_role blob appears as follows:

                        "selected_roles": [
                          "controller",
                          "standby",
                          "arbiter",
                          "worker"
                        ],

The value of the selected_roles property is a list of role IDs. The example shown above is taken from the choice selection that activates HBase support.

Note: In this particular Catalog entry, the top-level selected_roles property is an empty list; no roles at all will be activated unless the user provides some input (choice selections). This is a valid arrangement and reflects the fact that, for this Catalog entry, some choices must be made before any usable application framework can be provided in this cluster. By contrast, some other Catalog entries have roles and services that are always selected.

Node Services

The node_services blob appears as follows:

                        "node_services": [
                          {
                            "role_id": "controller",
                            "service_ids": [ "ganglia", "ganglia_api", "ssh", "gmetad", "gmond", "httpd" ]
                          },
                          {
                            "role_id": "standby",
                            "service_ids": [ "ssh", "gmond" ]
                          },
                          {
                            "role_id": "arbiter",
                            "service_ids": [ "ssh", "gmond" ]
                          },
                          {
                            "role_id": "worker",
                            "service_ids": [ "ssh", "gmond" ]
                          }
                        ],

Each element of this list is a node_services object that describes the services available on a given role. The role may or may not be selected; this data structure simply indicates that if a certain role is selected (according to choice selections), then these are the services a node with that role will provide. The top-level node_services in this example Catalog entry are all of the ancillary services that don't depend on choices like HBase support or MR type.

The properties of each node_services object are:

Config Metadata

The config_metadata appears as follows:

                        "config_meta": {
                          "streaming_jar": "/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar",
                          "impala_jar_version": "0.1-SNAPSHOT",
                          "cdh_major_version": "CDH5",
                          "cdh_full_version": "5.4.3",
                          "cdh_parcel_version": "5.4.3-1.cdh5.4.3.p0.6",
                          "cdh_parcel_repo": "http://archive.cloudera.com/cdh5/parcels/5.4.3"
                        },

In this example, config_meta is a key-value store. These values are only used by the scripts in the guest package and are thus completely opaque to EPIC. These values may be referenced during node setup. For example, the streaming_jar value is conventionally referenced by the script that runs Hadoop Streaming jobs.

Choice selections may cause the definition of multiple config_meta lists that together form the KV store visible to the in-guest scripts. To avoid confusion, key conflicts are not allowed. For example, it is legal for mutually exclusive choice selections to define different values for a key, but it is not legal for the same key to be defined more than once when composing the KV store that results from a particular set of choice selections.

Config Choices

This config_choices blob appears as follows:

                        "config_choices": [
                          {
                            "id": "hbase",
                            "type": "boolean",
                            "label": {
                              "name": "HBase"
                            },
                            "selections": [
                              {
                                "id": false
                              },
                              {
                                "id": true,
                                "config": {
                                  ...
                                }
                              }
                            ]
                          },
                          {
                            "id": "mrtype",
                            "type": "multi",
                            "label": {
                              "name": "MR Type"
                            },
                            "selections": [
                              {
                                "id": "mrv1",
                                "label": {
                                  "name": "MRv1"
                                },
                                "config": {
                                  ...
                                }
                              },
                              {
                                "id": "yarn",
                                "label": {
                                  "name": "YARN"
                                },
                                "preferred": true,
                                "config": {
                                  "selected_roles": [
                                    "controller",
                                    "worker"
                                  ],
                                  "node_services": [
                                    ...
                                  ],
                                  "config_choices": [
                                    {
                                      "id": "yarn_ha",
                                      "type": "boolean",
                                      "label": {
                                        "name": "YARN and HDFS High Availability"
                                      },
                        
                                      "selections": [
                                        {
                                          "id": false
                                        },
                                        {
                                          "id": true,
                                          "config": {
                                            ...
                                          }
                                        }
                                      ]
                                    }
                                  ],
                                  "config_choices":[
                                    {
                                      "label": {
                                        "name": "CLouderaManagerServer"
                                      },
                                      "type": "string",
                                      "id": "clouderamanager-server"
                                    }
                                  ]
                                }
                              }
                            ]
                          },
                        }

This blob lists the choices available to the API/UI user when creating a cluster. Each choice has some number of valid selections (either Boolean or multiple-choice) that can be provided to satisfy that choice. A given selection can then contain a nested config, as described previously.

In this example, one choice describes whether or not to activate HBase support. Another describes the choice between using MRv1 or YARN. If YARN is selected, then there is a further choice as to whether to activate cluster High A.

Each of these choices activates certain roles for deployment and selects certain services to be present on nodes of given roles.

This structure is fairly generic; however, EPIC constrains the choices to those currently defined among the various Catalog entries provided as part of the EPIC release. Please contact BlueData support if you wish to define choices in a Catalog entry that you are authoring.

The properties of each choice object are: