Skip to content
vWorld
Menu
  • Main Page
  • About
  • Study Guide
    • VCAP-CMA Deploy 2018
Menu

Comprehensive Guide to the deploy.sh Script in VMware Aria Automation

Posted on February 25, 2025February 25, 2025 by admin

Introduction to the deploy.sh Script

The deploy.sh script is a fundamental tool in the VMware Aria Automation ecosystem (formerly vRealize Automation), responsible for deploying, configuring, and managing all components of this advanced environment. Located in the /opt/scripts/ directory on the Aria Automation virtual machine, it serves as the central orchestration point for the entire system.

The main tasks of the deploy.sh script include:

  1. Environment Initialization – The script prepares all necessary Kubernetes resources, namespaces, and infrastructure components. It creates the basic data structures that will be used by other system components. This stage is fundamental to the entire process as it establishes the foundation on which subsequent application layers will be built.


  2. Component Configuration and Deployment – Deploy.sh manages the installation and configuration of dozens of microservices that together form the Aria Automation ecosystem. Each component has specific configuration requirements, dependencies, and runtime parameters that the script manages in an automated manner. The script uses Helm Charts to describe and deploy each component, ensuring repeatability and reliability of the process.


  3. Database Management – The script coordinates the deployment, configuration, and migration of PostgreSQL databases. It handles both scenarios with a single database instance (single-DB) and more advanced configurations with dedicated databases for each service (multi-DB). Automated backup mechanisms, data migration, and integrity verification ensure data security during updates or system reconfiguration.


  4. Distributed Service Synchronization – In microservice architecture, proper synchronization and communication between components is critical. The deploy.sh script manages the configuration of communication systems (RabbitMQ), service registration, endpoint establishment, and routing configuration, ensuring a cohesive ecosystem of cooperating services.


  5. Comprehensive Lifecycle Management – Deploy.sh not only deploys new instances but also manages the full application lifecycle – from initial installation, through updates and reconfigurations, to controlled shutdown. The script implements idempotent operations that can be safely repeated without risk of system damage.


The deployment process using the deploy.sh script typically begins after the Aria Automation virtual machine starts. The administrator executes the script, which initiates the installation process for all necessary services and components, using Kubernetes as the container orchestration platform and Helm as the package manager.

In situations requiring environment restart, configuration cleanup, or data migration, the deploy.sh script offers a range of configuration options. For example, when it’s necessary to stop services and clean the environment, you can use the --shutdown option (or the deprecated --onlyClean option). The recommended sequence of actions in such a case includes:

  1. Stopping services – First, stop all active services using the dedicated svc-stop.sh script:


    /opt/scripts/svc-stop.sh --force

  2. Waiting period – Then it’s recommended to wait about 120 seconds, which gives time for safe termination of all processes and resource release:


    sleep 120

  3. Environment cleanup – After this period, you can run the deploy.sh script with the cleanup option:


    /opt/scripts/deploy.sh --shutdown

  4. Redeployment – After the cleanup process is complete, you can redeploy all services by running the script without additional parameters:


    /opt/scripts/deploy.sh

This sequence ensures safe and controlled stopping, cleaning, and restarting of the entire environment, minimizing the risk of problems related to improper process termination or remnants of previous configuration.

Detailed Architecture of the deploy.sh Script

1. Help Configuration and Advanced Argument Handling

The deploy.sh script begins by defining the displayHelp() function, which serves as interactive documentation for the tool, presenting all available options with their descriptions. This function is key for users as it allows them to quickly familiarize themselves with the script’s capabilities without having to analyze its source code.

displayHelp() {
 echo "Deploy or re-deploy all Prelude services"
 echo ""
 echo "Usage:"
 echo "./deploy.sh [Options]"
 echo ""
 echo "Options:"
 echo "-h --help Display this message."
 echo "--deleteDatabases Delete postgres databases of all services."
 echo "--shutdown Shutdown gracefully all services."
 echo "--withHttpProxy Enable Http proxy and route all outgoing service traffic to it. Proxy port: 30128. Proxy web console port: 30333."
 echo "--onlyClean Deprecated. Use --shutdown instead."
 echo "--quick Internal use only. Reduce/eliminates some internal timeouts."
 echo "--multiDb Deploy a separate DB server for each service. In cluster deployments DB pods with primary roles are distributed evenly across all nodes."
 echo "--legacyEndpointRegistration Create default endpoints using deploy script instead of provisioning service."
 echo "--enableAdapterHostSvc Allowed values: true or false. Enable/disable adapter-host-service as an adapters host."
 # Additional undocumented options...
}

After defining the help function, the script implements an command-line argument processing mechanism. This mechanism is flexible – it handles both short (single-letter) options and long formats, as well as parameters with values passed after the “=” sign. The following code fragment demonstrates this implementation:

while [ "$1" != "" ]; do
 PARAM=echo $1 | awk -F= '{print $1}'
 VALUE=echo $1 | awk -F= '{print $2}'
 case $PARAM in
 -h | --help)
 displayHelp
 exit
 ;;
 --deleteDatabases)
 DELETE_DATABASES=true
 ;;
 --multiDb)
 MULTI_DB=true
 ;;
 # ... other options
 --enableAdapterHostSvc)
 if [[ "$VALUE" == "true" ]]; then
 ENABLE_ADAPTER_HOST_SVC=true
 else
 ENABLE_ADAPTER_HOST_SVC=false
 fi
 ;;
 *)
 echo "Error: Unknown parameter "$PARAM""
 echo ""
 displayHelp
 exit 1
 ;;
 esac
 shift
done

This code fragment uses a while loop to iterate through all arguments passed to the script. For each argument:

  1. It extracts the parameter name and its value using awk -F=, which allows handling formats like --parameter=value
  2. Uses a case instruction to match the parameter to a known option
  3. Depending on the option, sets the appropriate configuration variable
  4. For options with values (like --enableAdapterHostSvc), analyzes the passed value and sets the variable accordingly
  5. If the parameter doesn’t match any known option, displays an error, help, and exits

Through this mechanism, the script establishes key configuration flags that determine its further operation:

  • DELETE_DATABASES – controls whether databases should be deleted and initialized a new, which is useful for migration or troubleshooting corrupted data
  • ENABLE_RESOURCE_LIMITS – determines whether Kubernetes resource limits (CPU, memory) should be applied, which can affect system performance and stability
  • SHUTDOWN – decides whether services should be stopped, which is used in environment cleanup scenarios
  • MULTI_DB – configures whether the system should use dedicated databases for each service, which is recommended in production environments for better isolation and scalability
  • ENABLE_ADAPTER_HOST_SVC – enables or disables the adapter host service, which is responsible for handling integration adapters
  • ENABLE_EXTENSIBILITY_SUPPORT – controls support for extensions, allowing platform customization for specific organizational needs

It’s worth noting that the script also handles deprecated options (marked as “Deprecated”), maintaining backward compatibility with earlier versions or existing automation scripts. For example, the --onlyClean option is deprecated but still supported, and the script directs the user to the newer --shutdown option.

The argument processing mechanism not only sets internal configuration variables but also implements input validation – it checks the correctness of values for parameters such as --enableAdapterHostSvc, which expect specific values (true/false). This prevents errors resulting from incorrect input data and ensures configuration consistency.

The completion of the argument processing section establishes a complete configuration profile for the current script invocation, which determines all subsequent operations and decisions made during the deployment process. This allows the administrator to precisely customize the deployment process to specific environment needs and requirements.

2. Advanced Logging System

The logging system implemented in the deploy.sh script is an example of a well-designed, multi-layered mechanism that not only documents the deployment process but also serves as an invaluable diagnostic tool.

The first step in configuring the logging system is establishing unique, time-stamped log files and rotation mechanisms:

log_timestamp=$(date --utc +'%Y-%m-%d-%H-%M-%S')
if [[ -f /var/log/deploy.log ]] && [[ ! -h /var/log/deploy.log ]]; then
 mv /var/log/deploy.log /var/log/deploy-old.log
fi
exec > >(tee -a "/var/log/deploy-$log_timestamp.log") 2>&1
ln -sfT "deploy-$log_timestamp.log" /var/log/deploy.log

This code fragment performs several key actions:

  1. Timestamp Generation – The script creates a unique timestamp based on the current UTC time in year-month-day-hour-minute-second format, ensuring that each log file has a unique name, facilitating identification and organization of historical logs.


  2. Handling Existing Logs – The script checks if the /var/log/deploy.log file already exists and is not a symbolic link. If so, it moves it to deploy-old.log, preserving the previous log as a backup. This prevents the loss of important information from previous runs.


  3. Output Redirection – Using the advanced construction exec > >(tee -a "log_file") 2>&1, the script redirects both standard output (stdout) and the error stream (stderr) to the tee command, which simultaneously:


  • Displays all messages on the console (allowing the administrator to monitor progress)
  • Saves the same messages to the log file (the -a option adds data to the file instead of overwriting it)
  1. Symbolic Link Creation – The script creates a symbolic link /var/log/deploy.log pointing to the newest log file, providing a constant, predictable access point to current information, regardless of the unique file name.

After configuring the basic logging infrastructure, the script defines the log_stage() function, which introduces a hierarchical structure to the logs:

log_stage() {
 set +x
 echo
 echo "========================="
 echo "[$(date "+%Y-%m-%d %H:%M:%S.%3N%z")] $@"
 echo "========================="
 echo
 set -x
}

This elegant function:

  1. Temporarily Disables Debug Mode – The set +x instruction turns off command display, making section headers more readable in logs.


  2. Formats a Clear Separator – The function creates a visually distinctive block with horizontal lines, making it easier to browse long logs and quickly locate the beginning of each section.


  3. Adds a Precise Timestamp – Date and time are displayed with millisecond precision and timezone information, which is invaluable for performance analysis and diagnosing time-related issues.


  4. Passes the Message – The function displays the passed message describing the beginning section.


  5. Restores Debug Mode – After printing the header, the set -x instruction restores debug mode, in which all executed commands are displayed.


The use of this function throughout the script creates a clear, hierarchical log structure where each main stage of the deployment process is clearly marked. For example:

=========================
[2023-05-15 14:30:27.123 +0000] Creating kubernetes namespaces
=========================

+ k8s_create_namespace ingress
...

=========================
[2023-05-15 14:31:15.456 +0000] Applying ingress certificate
=========================

+ /opt/scripts/prepare_certs.sh
...

This structure significantly facilitates both manual log browsing and automatic analysis using parsing tools.

Additionally, the logging system works with the error handling mechanism, ensuring that in case of deployment failure, a complete diagnostic package is automatically generated:

on_exit() {
 if [ $? -ne 0 ]; then
 echo "Deployment failed. Collecting log bundle ..."
 ( cd /root; vracli log-bundle )
 fi
 # ... other cleanup operations
}

trap on_exit EXIT

The vracli log-bundle tool invoked in case of error creates a comprehensive package containing not only deploy.sh script logs but also:

  • Logs of all Kubernetes system components
  • Configuration of Kubernetes resources (pods, services, deployments, secrets)
  • Information about database and service status
  • Network and connection configuration
  • Resource metrics (CPU, memory, disk)

This multi-layered logging system forms the foundation of diagnostic processes and problem-solving in the Aria Automation environment, providing:

  • Clear documentation of the deployment process
  • Precise tracking of occurring issues
  • Historical analysis of previous deployments
  • Automatic generation of complete diagnostic packages
  • Hierarchical structure facilitating analysis

3. Comprehensive Error Handling and Safety Mechanisms

The deploy.sh script implements a multi-layered, well-thought-out system of error handling and safety mechanisms that ensures the reliability of the deployment process even in the event of unforeseen problems. This system consists of several key components:

The central element of error handling is the on_exit function, which is called automatically when the script ends, regardless of the reason:

on_exit() {
 if [ $? -ne 0 ]; then
 echo "Deployment failed. Collecting log bundle ..."
 ( cd /root; vracli log-bundle )
 fi

 # Remove temporary helm_upstall check directory
 if [ -n "$UPSTALL_STATUS_DIR" ]; then
 clear-helm-upstalls-status $UPSTALL_STATUS_DIR true
 fi

 # Clear the value of property cache.timeout in vracli.conf file
 # Do not generate new service status
 vracli service status --unset-config service.status.cache.lifetime || true

 rm -rf /tmp/deploy.tmp.*
}

trap on_exit EXIT

This function performs several important tasks:

  1. Error Detection – It checks the exit code of the last command ($?). If it’s non-zero (indicating an error), it initiates the diagnostic procedure.


  2. Automatic Diagnostics – In case of an error, it generates a complete diagnostic package using vracli log-bundle. This package contains not only logs but also detailed information about system state, which is invaluable during problem analysis.


  3. Temporary Resource Cleanup – Regardless of the outcome, the function ensures the removal of temporary files and directories (UPSTALL_STATUS_DIR, /tmp/deploy.tmp.*), preventing garbage from being left in the system.


  4. Cache Configuration Reset – It restores default cache settings for service status, ensuring that the next call to vracli service status will generate fresh data.


The trap on_exit EXIT instruction registers this function as a handler for the EXIT signal, meaning it will be called regardless of whether the script ends normally or prematurely (e.g., due to an error or user interruption).

Additionally, the script defines the die() function, which provides controlled termination in case a critical error is detected:

die() {
 local msg=$1
 local exit_code=$2

 if [ $# -lt 2 ]; then
 exit_code=1
 fi

 set +x
 clear || true
 echo $msg
 exit $exit_code
}

This function:

  1. Accepts an Error Message – The first argument is a human-readable problem description to be displayed.


  2. Accepts a Custom Exit Code – The second, optional argument allows specifying an error code (default is 1), which can be used by automation systems to differentiate error types.


  3. Disables Debug Mode – The set +x instruction ensures that the error message will be clearly visible, without mixing with debugging output.


  4. Clears the Screen – The clear || true command clears the screen (if possible), increasing the visibility of the error message.


  5. Displays the Message and Exits – It prints the error message and ends the script with the specified exit code.


The die() function is strategically used at key points in the script where error conditions can be detected. For example:

if ! vracli status first-boot -w 300; then
 die "Timeout expired"
fi

In this case, if the Kubernetes cluster doesn’t reach the “first-boot” state within 300 seconds, the script will be safely interrupted with a readable “Timeout expired” message.

In addition to these main mechanisms, the script uses a number of advanced error handling techniques:

  1. Timeout Control – For long-running operations, the script applies the timeout command, which automatically interrupts the operation if it exceeds a specified time:
timeout 300s bash -c wait_deploy_health
  1. Retry Mechanisms – For operations that may temporarily fail (e.g., due to network delays), the script uses the retry_backoff function from the retry_utils.sh module:
retry_backoff "5 15 45" "Failed to load existing vRO config" "load_existing_config"

This function tries to perform the operation, and in case of failure, waits a specified time (5, 15, 45 seconds) before subsequent attempts.

  1. Error Handling in Parallel Processes – For operations performed in parallel (in the background), the script implements a status checking mechanism:
check-helm-upstalls-status() {
 # ... code checking status
 if [ "${failure_count}" -gt "0" ]; then
 log_stage "There are failed install/upgrade of helm releases"
 return 1
 fi
}
  1. Selective Error Ignoring – In some cases, the script deliberately ignores specific errors when they are not critical:
vracli ntp show-config || true

The || true operator means that failure of the vracli ntp show-config command will not cause the script to terminate.

  1. Dynamic Adaptation to Error Conditions – In some scenarios, the script takes specific remedial actions instead of simply terminating:
if [[ "$retry_count" -gt 5 ]]; then
 log_stage "Too many retries. Attempting recovery procedure..."
 # ... recovery code
fi

All of these mechanisms create a layered, resilient system that provides:

  • Reliable detection and reporting of errors
  • Automatic diagnostics and problem information collection
  • Controlled termination in case of critical errors
  • Intelligent retry of operations in case of temporary problems
  • Proper resource cleanup, even in case of failure
  • Support for parallel operation execution with safety preserved

Such comprehensive error handling is key to the reliable deployment of complex systems like VMware Aria Automation, where the installation process includes many interdependent components and can be susceptible to various problems – from temporary network failures to resource issues to configuration conflicts.

4. Multi-layered Environment State Checking

One of the key aspects of the deploy.sh script is the implementation of advanced environment state checking mechanisms that ensure all components are properly prepared before starting significant deployment operations. This multi-layered system of environment state verification is essential for ensuring stability and predictability of the deployment process.

The central element of this system is the wait_deploy_health function, which performs cyclical health checks until a proper state is achieved:

# Run a health check with the deploy profile
wait_deploy_health() {
 while true; do
 /opt/health/run-once.sh deploy && break || sleep 5
 done
}
export -f wait_deploy_health

log_stage "Waiting for deploy healthcheck"
timeout 300s bash -c wait_deploy_health

This code fragment performs the following tasks:

  1. Cyclical Check Function Definition – The wait_deploy_health function implements an infinite loop that:
  • Calls the /opt/health/run-once.sh script with the “deploy” profile
  • If the script ends successfully (exit code 0), breaks the loop (break)
  • Otherwise, waits 5 seconds before the next attempt
  1. Function Export to Subprocesses – The export -f wait_deploy_health instruction allows using this function in subprocesses, which is necessary for operation with the timeout command


  2. Check Start Logging – The log_stage function documents the beginning of the health check waiting process


  3. Time Limit – The timeout 300s command establishes a 5-minute time limit, after which, if the health check still fails, the operation will be interrupted


The /opt/health/run-once.sh script used in this process is a complex diagnostic tool that performs a series of specialized tests:

#!/bin/bash -l

PATH=$PATH:/sbin:/usr/sbin

if [ -z "${1}" ]; then
 echo "Health check profile required"
 exit 1
fi

rundir=$( mktemp -d )
cd "${rundir}"

# Run the requested health checks concurrently
/usr/bin/make --file=/opt/health/Makefile --jobs --keep-going --output-sync=target "${1}"
err=$?

cd ..
rm -rf "${rundir}"

exit "${err}"

This script:

  1. Requires a health check profile to be provided as an argument
  2. Creates a temporary working directory
  3. Uses the Make system (with a Makefile) to run multiple health tests in parallel
  4. The --jobs option allows parallel test execution, speeding up the process
  5. The --keep-going flag causes even if some tests fail, the others will still be executed
  6. The --output-sync=target parameter ensures that output from parallel processes will not mix

In the /opt/health/Makefile file, various test profiles are defined, including the “deploy” profile, which checks key aspects of the environment:

  • Kubernetes API availability
  • Basic cluster component status
  • Network configuration
  • System resource availability
  • Infrastructure service status

After the basic health check is complete, the script continues verification by checking Kubernetes cluster readiness:

# Wait for K8s to be ready before proceed
# Approximately 5 minutes of timeout before failing
if ! vracli status first-boot -w 300; then
 die "Timeout expired"
fi

This command:

  1. Calls vracli status first-boot with the -w 300 parameter, meaning wait up to 300 seconds (5 minutes) for cluster readiness
  2. If after this time the cluster is still not ready, the script calls the die function with the “Timeout expired” message, which causes controlled termination of the deployment process

Environment state verification is not limited only to the initial stages – the script contains checkpoints distributed throughout the entire deployment process. For example, before deploying infrastructure services:

log_stage "Deploying infrastructure services"

# ... environment preparation ...

# Check if Kubernetes API is available
kubectl get nodes &> /dev/null || {
 echo "Kubernetes API is not responding"
 exit 1
}

# Verify etcd availability
vracli cluster etcd health || {
 echo "etcd is not healthy"
 exit 1
}

# ... continue deployment ...

After deploying key components, the script again verifies system state:

log_stage "Verifying core services"

# Wait for core services readiness
timeout 300s bash -c 'until kubectl -n prelude get pods | grep identity-service | grep -q Running; do sleep 5; done'
timeout 300s bash -c 'until kubectl -n prelude get pods | grep rabbitmq-ha | grep -q Running; do sleep 5; done'

# Check service status
vracli service status | grep -E "identity-service|rabbitmq-ha" | grep -qv Running && {
 echo "Core services are not running"
 exit 1
}

Such multi-layered verification ensures that:

  1. The base environment (Kubernetes, network, resources) is properly configured
  2. Basic infrastructure components are available and working correctly
  3. Key services have reached the “Running” state before continuing deployment
  4. The process will not continue if problems are detected, preventing inconsistent states

Additionally, the script implements verification mechanisms specific to individual components, for example for databases:

log_stage "Verifying database health"

# Check primary node availability for each database
for db in ${databases[@]}; do
 if ! vracli db status --dbname "$db" | grep -q "Primary node: Available"; then
 echo "Database $db primary node is not available"
 exit 1
 fi
done

# Verify replicas (in multi-DB mode)
if [[ "$MULTI_DB" == "true" ]]; then
 for db in ${databases[@]}; do
 if ! vracli db status --dbname "$db" | grep -q "Replicas: 2/2"; then
 echo "Database $db replicas are not fully available"
 exit 1
 fi
 done
fi

This comprehensive environment state verification system forms the foundation of a stable deployment process, ensuring that:

  • Each stage begins in a known, predictable state
  • Problems are detected as early as possible, before they cause cascading failures
  • The deployment process is deterministic and repeatable
  • The administrator receives clear messages about any problems
  • The environment is not left in an inconsistent state in case of failure

Thanks to these mechanisms, the deploy.sh script can reliably deploy the complex VMware Aria Automation environment, even in variable environmental conditions or unstable base infrastructure.

5. Advanced Database Configuration and Intelligent Backup Creation

Database management is one of the most advanced aspects of the deploy.sh script. This section implements complex mechanisms for handling various deployment scenarios, data migration, and ensuring high availability. The script uses the db_utils.sh module, which contains specialized functions for managing PostgreSQL databases in a container environment:

source /opt/scripts/db_utils.sh

log_stage "Backing up databases from existing pods"
# The backup_db_before_destroy function performs database backup before destroying existing data
backup_db_before_destroy "$MULTI_DB" "$DELETE_DATABASES" "$SHUTDOWN" "$NAMESPACE_PRELUDE"

The backup_db_before_destroy function implements complex decision logic to determine whether a data backup is required and in what mode:

backup_db_before_destroy()
{
 local multi_db="$1"
 local delete_databases="$2"
 local shutdown="$3"
 local namespace="$4"
 local multi_db_previous=""

 # If we're doing shutdown or deleting databases, migration is not required
 if [ "$shutdown" == "true" -o "$delete_databases" == "true" ]
 then
 export MULTI_DB_MIGRATE=false
 return 0
 fi

 # Detecting previous database configuration
 local database_directories=(/data/db/p-*)
 if [[ -d "/data/db/live" ]]
 then
 multi_db_previous=false
 elif [[ -d "${database_directories[0]}/live" ]]
 then
 multi_db_previous=true
 else
 export MULTI_DB_MIGRATE=false
 return 0
 fi

 # Check if there's been a change in mode (multi_db)
 if [ "$multi_db_previous" != "$multi_db" ]
 then
 export MULTI_DB_MIGRATE=true
 else
 export MULTI_DB_MIGRATE=false
 return 0
 fi

 # Prepare backup directory and perform data dump
 vracli cluster exec -- bash -c 'mkdir -p /data/db/migrate'
 export MULTI_DB_BACKUP=$(mktemp -d --dry-run /data/db/migrate/XXX)
 vracli cluster exec -- bash -c "mkdir -p ${MULTI_DB_BACKUP}"

 dump_all_databases "$namespace" "$MULTI_DB_BACKUP"
}

This code fragment contains advanced logic:

  1. Parameter Analysis – The function analyzes passed parameters (multi_db, delete_databases, shutdown) to determine the action strategy.


  2. Automatic Database Topology Detection – By checking directory structures on disk (/data/db/live for single-DB or /data/db/p-*/live for multi-DB), the function automatically determines whether the previous deployment used single-DB or multi-DB configuration.


  3. Mode Change Detection – Comparing the detected configuration with the requested one (the multi_db parameter) allows determining if there has been a mode change that requires data migration.


  4. Migration Preparation – If a mode change is detected, the function:


  • Creates a directory for migration (/data/db/migrate)
  • Generates a unique name for the backup directory
  • Performs a dump of all databases

The dump_all_databases function is responsible for creating backups of all databases:

function dump_all_databases()
{
 namespace="$1"
 backup_dir="$2"
 for database in $(kubectl get configmap db-settings -n ${namespace} -o json | jq -r ".data| keys[]"| grep -v "postgres" | grep -v "repmgr-db")
 do
 dump_database "$database" "$backup_dir"
 done
}

function dump_database()
{
 local database="$1"
 local backup_dir="$2"
 vracli cluster exec -- bash -c "vracli db dump ${database} > ${backup_dir}/${database}.sql || rm ${backup_dir}/${database}.sql"
}

This function:

  1. Gets a list of databases from the db-settings ConfigMap, skipping the postgres and repmgr-db databases, which are system databases
  2. For each database calls the dump_database function, which:
  • Performs a database dump to an SQL file
  • In case of error, removes the file to prevent trying to restore a corrupted backup

After creating backups and potentially deleting existing databases, the script initiates the database deployment process, using the upstall_postgres function:

function upstall_postgres()
{
 local multi_db="$1"
 local migrate="$2"
 local backup_dir="$3"
 local values="$4"
 local namespace="$5"
 local upstall_status_dir="$6"

 # Database deployment
 deploy_databases "$multi_db" "$values" "$namespace" "$upstall_status_dir"

 # Data migration, if required
 if [[ "$migrate" == "true" ]]
 then
 migrate_stored_data "$namespace" "$backup_dir"
 fi

 # Cleanup, depending on mode
 if [[ "$multi_db" == "true" ]]
 then
 helm-upstall postgres-measurer "" "$namespace"
 vracli cluster exec -- bash -c "rm -rf /data/db/live; rm -rf /data/db/backup; rm -rf /data/db/flags"
 else
 vracli cluster exec -- bash -c "rm -rf /data/db/p-*"
 fi
}

The deploy_databases function is responsible for parallel database deployment, which significantly speeds up the process:

function deploy_databases()
{
 local multi_db="$1"
 local values="$2"
 local namespace="$3"
 local upstall_status_dir="$4"
 local databases=$(kubectl get configmap db-settings -n ${namespace} -o json | jq -r ".data| keys[]"| grep -v "postgres" | grep -v "repmgr")

 if [[ "$multi_db" == false ]]
 then
 databases=("postgres")
 fi

 for database in ${databases[@]}
 do
 deploy_database "$database" "$values" "$namespace" "$upstall_status_dir"
 done
 wait
}

This function:

  1. In single-DB mode, uses only one database (postgres)
  2. In multi-DB mode, gets a list of all required databases from the ConfigMap
  3. For each database runs the deploy_database function in the background (asynchronously)
  4. At the end calls wait to wait for all parallel processes to complete

The deploy_database function configures and deploys a single database:

function deploy_database()
{
 local database="$1"
 local values="$2"
 local namespace="$3"
 local upstall_status_dir="$4"
 local data_directory_path="/data/db"
 local release_name="$database"
 if [[ "$database" != "postgres" ]]
 then
 release_name=$(echo "$release_name" | sed "s/-db//;s/-//g")
 release_name="p-${release_name}"
 data_directory_path="${data_directory_path}/${release_name}"
 values="${values},multiDB=true"
 else
 values="${values},multiDB=false"
 fi
 vracli cluster exec -- bash -c "rm -f ${data_directory_path}/live/pg_stat/repmgrd_state.txt"
 helm-upstall postgres "${values},releaseName=${release_name},dbName=${database}" "${namespace}" '' '' 7200 CHECK_DIR=${upstall_status_dir} &
}

This function performs the following operations:

  1. Determines the Helm release name and data directory path, depending on whether it’s the main database (postgres) or a dedicated service database
  2. Clears the repmgrd state file, ensuring proper initialization of the replication cluster
  3. Calls helm-upstall with appropriate parameters, including:
  • Release name (modified if it’s a dedicated database)
  • Database name
  • Long timeout (7200 seconds) to ensure sufficient time for initialization
  • Status directory for progress monitoring

In case of data migration (change from single-DB to multi-DB mode or vice versa), the migrate_stored_data function restores the saved data:

function migrate_stored_data()
{
 local namespace="$1"
 local backup_dir="$2"
 local databases=$(kubectl get configmap db-settings -n ${namespace} -o json | jq -r ".data| keys[]"| grep -v "postgres" | grep -v "repmgr-db")
 for database in ${databases[@]}
 do
 local backup_file="${backup_dir}/${database}.sql"
 if [[ -s ${backup_file} ]]
 then
 vracli cluster exec -- bash -c "vracli db restore --dbname ${database} ${backup_file} &> /dev/null"
 else
 exit 1
 fi
 done
 vracli cluster exec -- bash -c "rm -rf ${backup_dir}"
}

This function:

  1. Gets a list of databases (again skipping the system databases postgres and repmgr-db)
  2. For each database checks if the backup file exists and is not empty
  3. If so, restores the database from the backup using vracli db restore
  4. After completion, removes the backup directory

Additionally, for high availability environments, the script contains functions for monitoring and balancing primary nodes in the cluster:

function get_primaries()
{
 database_pods=$(kubectl get pods -n prelude -o custom-columns=:metadata.name,:spec.containers[0].image | grep db-image | cut -d " " -f1 | grep "0")
 for pod in ${database_pods[@]}
 do
 pod_data=$(kubectl exec -n prelude ${pod} -- bash -c "chpst -u postgres repmgr node check --upstream 2>/dev/null")
 if [[ "$pod_data" =~ "primary" ]]
 then
 primary="$pod"
 else
 primary=$(echo "$pod_data" | sed -r "s/.*upstream.*"([^.]*)..*/1/")
 fi
 echo "${primary}"
 done
}

function draw_table()
{
 local pods_in_0=()
 local pods_in_1=()
 local pods_in_2=()

 for pod in $(get_primaries)
 do
 local node_id=$(echo $pod | grep -Eo "[0-9]+")
 if [[ $node_id == 0 ]]
 then
 pods_in_0+=($pod)
 elif [[ $node_id == 1 ]]
 then
 pods_in_1+=($pod)
 else
 pods_in_2+=($pod)
 fi
 done
 output=""
 for i in {0..30}
 do
 if [ -n "${pods_in_0[$i]}" -o -n "${pods_in_1[$i]}" -o -n "${pods_in_2[$i]}" ]
 then
 output="$output${pods_in_0[$i]}, ${pods_in_1[$i]}, ${pods_in_2[$i]}n"
 else
 break
 fi
 done
 echo -ne $output | column -t -N Node0,Node1,Node2 -o "|" -s ','
}

These functions:

  1. Identify the primary node for each database
  2. Group them by cluster node (Node0, Node1, Node2)
  3. Generate a clear table showing the distribution of primary nodes, which is key to understanding high availability topology

This entire advanced database management system ensures:

  • Flexibility in configuration (single-DB vs multi-DB)
  • Automatic data migration when changing modes
  • Intelligent backup creation before potentially destructive operations
  • Parallel database deployment to speed up the process
  • Support for high availability clusters with replication
  • Clear visualization of database topology

Thanks to these mechanisms, the deploy.sh script can reliably manage databases in various configurations, ensuring both data security and optimal resource utilization.

6. Controlled Stopping and Removal of Existing Deployment

The deploy.sh script implements a thoughtful, multi-stage process for stopping and removing existing deployment, ensuring safe and controlled environment cleanup before reinstallation. This phase is crucial to ensure that new deployment starts in a clean, predictable state without remnants of previous configuration.

This process is initiated in the section marked as “Tear down existing deployment”:

log_stage "Tear down existing deployment"

# Graceful service stopping using the svc-stop.sh script
timeout 300s /opt/scripts/svc-stop.sh --force 2> /dev/null || true

# If the QUICK option was not selected, wait an additional 120 seconds
if [ "$QUICK" = false ] ; then
 sleep 120
fi

This initial code fragment performs the following tasks:

  1. Graceful Service Stopping – The script calls /opt/scripts/svc-stop.sh --force, which methodically stops all services in a controlled manner. The --force parameter ensures that the operation will continue even if there are problems with some services.


  2. Timeout for Long Operations – The timeout 300s command establishes a maximum time of 5 minutes to complete the stopping operation, which prevents the script from hanging in case of problems.


  3. Error Ignoring – The || true operator ensures that the script will continue even if the service stopping returns an error, which is important for operation idempotence (e.g., when services are already stopped).


  4. Stabilization Period – If the quick deployment option (QUICK) wasn’t selected, the script waits an additional 120 seconds, giving time for processes to fully terminate, resources to be released, and the system to stabilize.


The svc-stop.sh script performs many tasks related to safely stopping services:

# Fragment from svc-stop.sh
wait_deploy_health() {
 while true; do
 echo Health check iteration
 /opt/health/run-once.sh deploy && break || sleep 5
 done
}
export -f wait_deploy_health

if [[ "$@" != *"--force"* ]]; then
 timeout 300s bash -c wait_deploy_health
fi

helm ls -n prelude --short | grep -o -Fx -f /etc/vmware-prelude/services.list | xargs -r -t -n 1 -P 0 helm uninstall -n prelude --timeout=1200s

This script:

  1. Ensures the environment is in a stable state (health check)
  2. Identifies installed Helm releases from the service list
  3. Calls helm uninstall for each service, with a long timeout (20 minutes) for safe stopping

After stopping services, deploy.sh proceeds to remove Kubernetes namespaces:

# Removing Kubernetes namespaces
k8s_delete_namespace "${NAMESPACE_INGRESS}" 300
k8s_delete_namespace "${NAMESPACE_PRELUDE}" 600

# Removing clusterrolebinding for prelude
timeout 300s kubectl delete clusterrolebinding "${NAMESPACE_PRELUDE}"-view 2> /dev/null || true

The k8s_delete_namespace function implements advanced logic for removing namespaces with retry mechanisms and handling “stuck” pods:

function k8s_delete_namespace() {
 local ns="$1"
 [[ -z "$2" ]] && local timeout=300 || local timeout="$2"

 # Delete namespace with time limit
 timeout "$timeout"s kubectl delete namespace "${ns}" 2> /dev/null || true

 # Wait until namespace completely disappears (check every 5 seconds)
 local retry_interval=5
 local retries_count=$((timeout / retry_interval))
 until [ $retries_count -eq 0 ]; do
 ((retries_count-=1))
 local found=$(kubectl get namespaces --no-headers | cut -f 1 -d ' ' | grep -x "$ns" | wc -l)
 [[ -z "$found" ]] && local found=0
 if [ $found -eq 0 ]; then
 return 0
 else
 sleep $retry_interval
 fi
 vracli cluster exec -- /opt/scripts/kill_stale_pods.sh "$1" || true
 done
 return 1
}

This complex function:

  1. Initiates Namespace Deletion – Calls kubectl delete namespace with a specified timeout, which starts the cleanup process.


  2. Monitors the Deletion Process – In a loop, checks if the namespace still exists, using the kubectl get namespaces command with filtering.


  3. Eliminates “Stuck” Pods – During waiting, calls the kill_stale_pods.sh script, which identifies and terminates pods that may be blocking namespace deletion (e.g., due to stuck finalizers).


  4. Handles Timeout – If after a specified time (default 300 or 600 seconds) the namespace still exists, the function returns an error, which may indicate problems with resources blocking deletion.


The kill_stale_pods.sh script implements low-level mechanisms to identify and terminate problematic pods:

#!/bin/bash

node=$(current_node)

for pod in $(kubectl get pods -n "$1" --field-selector="spec.nodeName=$node" -o jsonpath='{.items[*].metadata.uid}'); do
 for id in $(docker ps -aq --no-trunc --filter="label=io.kubernetes.pod.uid=$pod" ); do
 pkill -9 -ef "^containerd-shim .*moby/$id"
 done
done

This script:

  1. Identifies pods in a specified namespace assigned to the current node
  2. For each pod finds its corresponding Docker containers
  3. Terminates containerd-shim processes associated with these containers

If the DELETE_DATABASES option is active, the deploy.sh script additionally cleans persistent data from disk:

if [ "$DELETE_DATABASES" = true ] ; then
 log_stage "Deleting persisted data"
 vracli cluster exec -- bash -c 'find /data/db -maxdepth 2 -type d -name live -printf "%Pn" | xargs -I {} rm -rf /data/db/{}'
 vracli cluster exec -- bash -c 'rm -rf /data/openldap; mkdir -p /data/openldap'
 vracli reset rabbitmq --confirm || true
 kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/clientsecrets", "value": ""}]' || true
 kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/crdssecret", "value": ""}]' || true
fi

This code fragment performs comprehensive data cleaning:

  1. Removing PostgreSQL Data – Identifies and removes database directories (/data/db/.../live).


  2. Cleaning LDAP – Removes the /data/openldap directory and creates a new, empty one.


  3. RabbitMQ Reset – Calls vracli reset rabbitmq --confirm, which cleans the message broker configuration.


  4. Secret Cleaning – Updates the vaconfig configuration object, removing saved client secrets and database secrets.


After completing the cleanup process, the script is ready to begin a new deployment in a fresh, clean environment.

The entire process of stopping and removing existing deployment is designed with:

  • Safe and controlled service stopping
  • Methodical removal of Kubernetes resources
  • Intelligent handling of “stuck” pod problems
  • Optional persistent data removal
  • Resilience to errors and unpredictable states

Thanks to these mechanisms, the deploy.sh script ensures that new deployment begins in a clean, predictable state, which is key to reliable installation or update of VMware Aria Automation.

7. Infrastructure Initialization: Namespace and SSL Certificate Management

After the cleanup phase, the deploy.sh script proceeds to initialize basic infrastructure, creating necessary Kubernetes namespaces and configuring the SSL certificate management system. This phase is fundamental to the entire deployment process as it establishes a secure foundation on which subsequent components will be built.

log_stage "Creating kubernetes namespaces"

# Create the ingress namespace, if necessary
k8s_create_namespace "${NAMESPACE_INGRESS}"

# Create the prelude namespace, if necessary
k8s_create_namespace "${NAMESPACE_PRELUDE}"

The k8s_create_namespace function implements idempotent Kubernetes namespace creation:

function k8s_create_namespace() {
 local ns="$1"
 if [[ $(kubectl get namespaces --no-headers | cut -f 1 -d ' ' | grep -x "$ns" | wc -l) == 0 ]]; then
 kubectl create namespace "$ns"
 fi
}

This function:

  1. Checks if a namespace with the given name already exists
  2. If not, creates it using kubectl create namespace
  3. Thanks to the checking condition, the function is idempotent – it can be safely called multiple times without risk of errors

After creating namespaces, the script proceeds to configure SSL certificates:

log_stage "Applying ingress certificate"
/opt/scripts/prepare_certs.sh
/opt/scripts/apply_certs.sh

The prepare_certs.sh script handles generating or retrieving SSL certificates:

#!/bin/bash
set -e

# Generate self-signed certificate for ingress if such does not exist
vracli certificate ingress --list &>/dev/null || vracli certificate ingress --generate auto --set stdin

This simple but effective script:

  1. Checks if an ingress certificate already exists (using vracli certificate ingress --list)
  2. If it doesn’t exist, generates a new, self-signed certificate using vracli certificate ingress --generate auto
  3. The --set stdin option allows interactively specifying certificate parameters (though in this case default values are used)

After preparing certificates, the apply_certs.sh script installs them in the Kubernetes cluster:

#!/bin/bash

CERT_INGRESS_PEM=$(mktemp --suffix=ingress.pem)
CERT_INGRESS_KEY=$(mktemp --suffix=ingress.key)
CERT_FQDN_PEM=$(mktemp --suffix=fqdn.pem)
CERT_PROXY_PEM=$(mktemp --suffix=proxy.pem)

# Remove existing secrets if they exist
if [[ $(kubectl get secrets -n ingress | grep cert-ingress | wc -c) -gt 0 ]]; then
 kubectl delete secret -n ingress cert-ingress
fi

if [[ $(kubectl get secrets -n prelude | grep cert-ext | wc -c) -gt 0 ]]; then
 kubectl delete secret -n prelude cert-ext
fi

# Get ingress certificate and key
vracli certificate ingress --list-key > $CERT_INGRESS_KEY
vracli certificate ingress --list > $CERT_INGRESS_PEM

# Create TLS secret for ingress
kubectl create secret tls cert-ingress 
 --cert=${CERT_INGRESS_PEM} 
 --key=${CERT_INGRESS_KEY} 
 -n ingress || exit $?

rm -f ${CERT_INGRESS_KEY}

# Handle load-balancer and proxy certificates
vracli certificate load-balancer --list
if [[ $? = 0 ]]
then
 vracli certificate load-balancer --list > $CERT_FQDN_PEM
else
 mv $CERT_INGRESS_PEM $CERT_FQDN_PEM
fi

kubectl_cmd="kubectl -n prelude create secret generic cert-ext --from-file=fqdn.pem=${CERT_FQDN_PEM} "
vracli certificate proxy --list > $CERT_PROXY_PEM && kubectl_cmd="$kubectl_cmd --from-file=https_proxy.pem=${CERT_PROXY_PEM}"

echo $kubectl_cmd
$kubectl_cmd

rm -f $CERT_INGRESS_PEM $CERT_FQDN_PEM $CERT_PROXY_PEM || true

This more complex script:

  1. Creates Temporary Files – Uses mktemp to create temporary files with appropriate suffixes.


  2. Removes Existing Secrets – Checks if the cert-ingress and cert-ext secrets already exist and, if so, removes them to avoid conflicts.


  3. Gets Certificates and Keys – Uses vracli certificate to retrieve the ingress certificate and its private key.


  4. Creates TLS Secret – Uses kubectl create secret tls to create a secret in the ingress namespace, which will be used by the ingress controller for TLS termination.


  5. Secures Private Key – Immediately removes the temporary file containing the private key, minimizing the risk of its exposure.


  6. Handles Load-Balancer Certificates – Tries to retrieve the load-balancer certificate, and if it doesn’t exist, uses the ingress certificate as a substitute.


  7. Dynamically Builds Command – Constructs a kubectl command dynamically, depending on proxy certificate availability.


  8. Creates Certificate Secret – Executes the built command, creating the cert-ext secret in the prelude namespace, containing FQDN certificates and optionally proxy.


  9. Cleans Temporary Files – Removes all temporary files, regardless of operation outcome.


After configuring certificates, the script calls apply_profiles.sh, which applies configuration profiles:

/opt/scripts/apply_profiles.sh

The apply_profiles.sh script is responsible for activating and configuring system profiles that can modify standard platform behavior:

#!/bin/bash
set -uo pipefail
shopt -s nullglob

export PRELUDE_PROFILE_ROOT=/etc/vmware-prelude/profiles

# Check parameters
if [[ "$#" != 0 ]]; then
 echo 'This command takes no arguments' >&2
 exit 1
fi

# Iterate through all profiles
for profile in "$PRELUDE_PROFILE_ROOT"/*; do
 export PRELUDE_PROFILE_PATH="$profile"
 profile_name="${profile##*/}"

 # Check profile structure correctness
 if [[ ! -x "$profile"/check ]]; then
 echo "Profile $profile_name malformed: check not executable" >&2
 fi

 # Execute check script to verify if profile should be active
 if "$profile"/check; then
 echo "Profile $profile_name: enabled" >&2

 # Apply Helm overrides if they exist
 if [[ -e "$profile/helm" ]]; then
 /opt/scripts/apply-override-dir "$profile/helm" "$profile_name.profile.prelude.vmware.com" || {
 echo "Profile $profile_name: failed to apply helm overrides" >&2
 exit 1
 }
 fi

 # Execute on-active script if it exists
 if [[ -e "$profile/on-active" ]] || [[ -h "$profile/on-active" ]]; then
 "$profile/on-active" || {
 err="$?"
 echo "Profile $profile_name: on-active failed with status $err" >&2
 exit 1
 }
 fi
 else
 # Exit code 1 means "normally inactive". Any other code is an error.
 err="$?"
 if [[ "$err" != 1 ]]; then
 echo "Profile $profile_name: check failed with status $err" >&2
 exit 1
 fi
 echo "Profile $profile_name: disabled" >&2

 # Execute on-inactive script if it exists
 if [[ -e "$profile/on-inactive" ]] || [[ -h "$profile/on-inactive" ]]; then
 "$profile/on-inactive" || {
 err="$?"
 echo "Profile $profile_name: on-inactive failed with status $err" >&2
 exit 1
 }
 fi
 fi
done

This advanced script performs the following tasks:

  1. Environment Configuration – Sets shell options and defines the profile directory.


  2. Profile Iteration – Searches the /etc/vmware-prelude/profiles directory and processes each found profile.


  3. Structure Checking – Verifies if the profile contains an executable check script.


  4. Activation State Determination – Calls the profile’s check script to determine if it should be active.


  • Exit code 0 means the profile should be active
  • Code 1 means the profile should be inactive
  • Any other code is treated as an error
  1. Active Profile Configuration Application – For active profiles:
  • If a helm directory exists, calls the apply-override-dir script to apply Helm configuration overrides
  • If an on-active script exists, executes it
  1. Inactive Profile Handling – For inactive profiles:
  • If an on-inactive script exists, executes it

A system profile is a directory containing:

  • A check script – determining if the profile should be active or not
  • A helm directory – containing files that override Helm configuration values
  • An on-active script – executed when the profile is active
  • An on-inactive script – executed when the profile is inactive

This plugin architecture allows extending the functionality and customizing the behavior of the deploy.sh script without modifying its source code, which is key to maintainability and extensibility.

It’s worth noting the apply-override-dir script, which is used to apply Helm configuration overrides from profiles:

#!/bin/bash
set -ueo pipefail

progname="$0"

die() {
 echo "$progname: $1" >&2
 exit 1
}

# ... argument processing ...

for f in "$1"/*.yaml; do
 [[ -f "$f" ]] || continue
 chart_name="${f##*/}"
 chart_name="${chart_name%.yaml}"
 echo "Applying $chart_name override from $1"
 /opt/scripts/apply-override -n "$namespace" -p "$priority" -s "$chart_name" "$name" < "$f"
done

This script:

  1. Iterates through YAML files in the profile directory
  2. For each file extracts the chart name (from filename)
  3. Calls the apply-override script to apply overrides for the specific chart

This entire infrastructure initialization phase:

  • Creates necessary Kubernetes namespaces
  • Configures SSL certificates for secure communication
  • Applies system profiles, customizing platform behavior
  • Establishes a secure and flexible base for further component deployment

Thanks to these mechanisms, the deploy.sh script ensures that the basic infrastructure is properly configured and ready to accept application components, which is a key step in the VMware Aria Automation deployment process.

8. Network Connection Configuration: etcd, proxy, and NTP

After initializing the basic Kubernetes infrastructure, the deploy.sh script proceeds to configure key network services that are essential for the proper functioning of the entire ecosystem. This phase includes configuring the etcd data storage system, HTTP proxy settings, and time synchronization (NTP).

log_stage "Updating etcd configuration to include https_proxy if such exists"
vracli proxy show || {
 vracli proxy set-default
 vracli proxy show
}

vracli proxy update-etcd

#
# Show and apply NTP configuration if such exists.
#
vracli ntp show-config || true
vracli ntp status || true

This code fragment performs several important tasks:

1. HTTP Proxy System Configuration

The first step is to ensure that the HTTP proxy is properly configured:

vracli proxy show || {
 vracli proxy set-default
 vracli proxy show
}

This code block:

  1. Checks Current Proxy Configuration – Calls vracli proxy show, which displays current HTTP proxy settings.
  2. Sets Default Configuration if Needed – If the command returns an error (proxy is not configured), executes the code block in curly braces:
  • vracli proxy set-default – sets default proxy configuration
  • vracli proxy show – displays the newly set configuration

The vracli proxy tool is an advanced component that manages HTTP proxy configuration for the entire environment. It can:

  • Set URL addresses for HTTP and HTTPS proxies
  • Configure exception lists (hosts and domains that should not use proxy)
  • Set credentials if the proxy requires authentication
  • Distribute proxy configuration to all system components

2. etcd Configuration Update

After ensuring that the proxy is properly configured, the script updates the configuration in etcd:

vracli proxy update-etcd

Etcd is a distributed key-value database that plays a critical role in the Kubernetes and VMware Aria Automation ecosystem:

  • It stores Kubernetes cluster configuration
  • Contains data about component states
  • Stores configuration settings for various services

The vracli proxy update-etcd command propagates proxy settings to etcd, which means:

  1. All components can access current proxy configuration
  2. New pods will automatically receive proper settings
  3. Configuration is stored in a central location, making it easier to manage

This operation is particularly important in corporate environments where access to external resources is controlled by HTTP proxies and where improper proxy configuration can lead to connectivity issues.

3. NTP Verification and Configuration

The last element of this phase is verification and potential configuration of time synchronization:

vracli ntp show-config || true
vracli ntp status || true

These commands:

  1. Display Current NTP Configuration – vracli ntp show-config shows which NTP servers are configured.
  2. Check Synchronization Status – vracli ntp status verifies if the system clock is properly synchronized.

The || true operator after both commands ensures that the script will continue even if these commands return an error (e.g., when NTP is not configured).

Proper time synchronization is critically important for a distributed environment for several reasons:

  • It enables consistent logging and monitoring
  • It’s essential for cryptographic protocols and authorization mechanisms
  • It affects the proper functioning of transaction mechanisms in databases
  • It ensures correct functioning of cache and data expiration mechanisms

Although the script does not explicitly configure NTP, it displays the current state, allowing the administrator to verify if time synchronization is correct. If needed, the administrator can use the vracli ntp set command to configure NTP servers.

It’s worth noting that vracli is an advanced platform management tool for VMware Aria Automation that encapsulates many configuration operations, simplifying the management process. For network configurations like proxy and NTP, this tool provides:

  • A consistent interface for various configuration operations
  • Input data validation
  • Setting propagation to all components
  • Configuration correctness verification

This entire network connection configuration phase ensures that the VMware Aria Automation environment is properly configured in terms of:

  • Access to external resources (via proxy)
  • Configuration storage and distribution (via etcd)
  • Time synchronization (via NTP)

These elements are fundamental to the proper functioning of the entire ecosystem and ensure that all components can reliably communicate both internally and with external resources.

9. Credential Management and Service Deployment Using Helm

After configuring the basic infrastructure and network services, the deploy.sh script proceeds to one of the most critical stages – generating credentials, managing secrets, and deploying service components using Helm. This phase is key to ensuring platform security and its proper functioning.

cd /opt/charts

log_stage "Deploying infrastructure services"

set +x

# Prepare credentials and database configuration
source /opt/scripts/persistence_utils.sh
credentials_load
/opt/scripts/generate_credentials.sh
credentials_save

# Load database settings
helm-upstall db-settings "" "${NAMESPACE_PRELUDE}"

The first stage of this phase is preparing and managing credentials:

  1. Change Working Directory – The script changes to the /opt/charts directory, where Helm chart definitions for deployed services are located.


  2. Disable Command Display – The set +x instruction turns off command display, which is crucial for security since subsequent operations involve sensitive data (passwords, keys).


  3. Load Persistent Data Management Tools – The script sources /opt/scripts/persistence_utils.sh, which contains functions for managing secrets and configuration.


  4. Load Existing Credentials – The credentials_load function restores previously saved credentials from the configuration object (CRD) or initializes new ones:


credentials_load() {
 tmpfile=$(mktemp)
 kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.crdssecret' | base64 -d > $tmpfile
 if [ ! -s $tmpfile ]; then
 rm -f $tmpfile
 kubectl -n prelude create secret generic db-credentials --from-literal=postgres=change_me
 else
 kubectl apply -f $tmpfile
 rm -f $tmpfile
 fi
}

This function:

  • Creates a temporary file
  • Gets the encoded crdssecret value from the configuration object
  • Decodes it from base64 and writes to the file
  • If the file is empty (no saved credentials), creates an empty secret with default values
  • Otherwise applies the saved secret using kubectl apply
  1. Generate New Credentials – The script calls /opt/scripts/generate_credentials.sh, which creates a comprehensive set of credentials for various platform components:
#!/bin/bash
set -e

source /opt/scripts/persistence_utils.sh

# Generate passwords for databases
credential_add_from_command "postgres" /opt/scripts/generate_pass.sh
credential_add_from_command "repmgr-db" /opt/scripts/generate_pass.sh
credential_add_from_command "abx-db" /opt/scripts/generate_pass.sh
# ... other databases ...

# Generate passwords for OpenLDAP
credential_add_from_command "openldap-admin" /opt/scripts/generate_pass.sh
credential_add_from_command "openldap-config" /opt/scripts/generate_pass.sh

# Generate encryption keys
credential_add_from_command "identity-encoder-salt" /opt/scripts/generate_encryption_key_base64.sh 32
credential_add_from_command "project-encryption-key" /opt/scripts/generate_encryption_key_base64.sh 48
credential_add_from_command "encryption-keys.json" bash -c 'echo "{"primary":1,"keys":[{"version":1,"value":"$(/opt/scripts/generate_encryption_key_base64.sh 32)"}]}"'
credential_add_from_command "key" /opt/scripts/generate_encryption_key.sh 48
credential_add_from_command "rsaKey" /opt/scripts/generate_rsa_encryption_key.sh 2048

# RabbitMQ configuration
credential_add_from_command "rabbitmq" /opt/scripts/generate_pass.sh
credential_add_from_command "rabbitmqConfig" /opt/scripts/generate_rmq_config.sh "$(credential_get "rabbitmq")"
credential_add_from_command "rabbitmq-erlang-cookie" /opt/scripts/generate_pass.sh

This script generates various types of credentials:

  • Database passwords – random alphanumeric strings
  • Base64-encoded encryption keys – used for encoding sensitive data
  • RSA keys – used for signing JWT tokens
  • RabbitMQ configuration – including erlang cookie, which is critical for clustering

The credential_add_from_command function calls a specified command and adds its result to the secret:

credential_add_from_command() {
 local key=$1
 shift

 if [ "$1" == "--force" ]; then
 shift
 elif credential_exists "$key"; then
 return 0
 fi

 value=$("$@" | base64 -w 0)
 kubectl patch secret db-credentials -n prelude --type=json -p="[{"op":"add", "path":"/data/$key", "value":"$value"}]"
}
  1. Save Generated Credentials – The credentials_save function saves the updated secret back to the configuration object:
credentials_save() {
 secrets=$(kubectl get secret db-credentials -n prelude -o yaml | base64)
 crdssecret=$(echo $secrets | tr -d 'n')
 kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/crdssecret", "value": "'"$crdssecret"'"}]'
}
  1. Load Database Settings – The script calls helm-upstall db-settings, which creates a ConfigMap with settings for databases:
helm-upstall db-settings "" "${NAMESPACE_PRELUDE}"

After preparing credentials, the script generates SSH keys for PostgreSQL and saves them as a Kubernetes secret:

# Generate SSH keys for PostgreSQL
SSH_DIR=$(mktemp -d /tmp/ssh-keys-XXXXXXXX)
ssh-keygen -N "" -f $SSH_DIR/id_rsa
kubectl -n ${NAMESPACE_PRELUDE} create secret generic postgres-ssh 
 --from-literal=private-key=$(base64 -w 0 $SSH_DIR/id_rsa) 
 --from-literal=public-key=$(base64 -w 0 $SSH_DIR/id_rsa.pub)
rm -rf $SSH_DIR

This code fragment:

  1. Creates a temporary directory
  2. Generates an SSH key pair without a password (-N "")
  3. Creates a postgres-ssh secret with the encoded key pair
  4. Removes the temporary directory, minimizing risk of key exposure

Next, the script deploys services using Helm. An important aspect is parallel execution of these operations, which significantly speeds up the deployment process:

set -x

# Set directory for installation statuses
export UPSTALL_STATUS_DIR=/tmp/deploy_$(date +%Y%m%d%H%M%S)
mkdir -p $UPSTALL_STATUS_DIR

# Parallel service deployment using Helm
helm-upstall endpoint-secrets "INGRESS_URL=${INGRESS_URL},INGRESS_CERT=${INGRESS_CERT},NODE_NAMES=${NODE_NAMES}" "$NAMESPACE_PRELUDE" &
helm-upstall no-license "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &
helm-upstall rabbitmq-ha "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &

This code fragment performs the following tasks:

  1. Enable Command Display – The set -x instruction restores command display mode since credential operations have been completed.


  2. Create Status Directory – The script creates a /tmp/deploy_YYYYMMDDHHMMSS directory, which will be used to monitor the status of parallel Helm installations.


  3. Parallel Service Deployment – The helm-upstall function is called three times, with an & symbol at the end, meaning background (asynchronous) execution:


  • endpoint-secrets – endpoint secret configuration
  • no-license – licensing service
  • rabbitmq-ha – RabbitMQ cluster with high availability

The helm-upstall function wraps the call to the helm-upstall script, which combines helm upgrade and helm install operations:

helm-upstall() {
 # ... initialization code ...
 
 /opt/scripts/helm-upstall --namespace="$3" --release-name="$release_name" --chart-path="$service_name" --set-string="$2" --set="$4" --timeout="$6" $5 || result=$?
 
 # ... result handling code ...
}

This function ensures idempotence of the deployment operation – it will work correctly regardless of whether a given Helm chart was previously installed or not.

The CHECK_DIR=${UPSTALL_STATUS_DIR} parameter causes the operation status to be saved to a file, allowing later checking if all parallel installations completed successfully:

check-helm-upstalls-status() {
 trap "clear-helm-upstalls-status $1" RETURN

 # Check if there are failed Helm install/upgrade operations
 check_files_count=$(cat $1/*.check | wc -l)
 failure_count=$(cat $1/*.check | grep 1 | wc -l)

 echo Failure counts are ${failure_count}, from $check_files_count finished 
 if [ "${failure_count}" -gt "0" ]; then
 log_stage "There are failed install/upgrade of helm releases"
 return 1
 fi
 return 0
}

This function:

  1. Counts the number of status files and the number of failures
  2. If there are failures, it reports this and returns an error code
  3. Otherwise, it returns success

This entire phase of credential management and service deployment is crucial for the security and functionality of the VMware Aria Automation platform:

  • It ensures secure storage and distribution of credentials
  • It generates unique, cryptographically strong passwords and keys
  • It saves credentials as Kubernetes secrets
  • It efficiently deploys services through parallel execution
  • It monitors deployment status to ensure reliability

This advanced implementation guarantees that:

  1. Each deployment has unique, strong credentials
  2. Credentials are securely stored
  3. The deployment process is idempotent and reliable
  4. Deployment is time-efficient due to parallel task execution

10. Advanced Identity Configuration and Authentication Mechanisms

After the phase of deploying basic infrastructure services, the deploy.sh script proceeds to configure the identity system and authentication mechanisms. This section is crucial for the security of the entire platform as it establishes the foundations of authorization and access control.

set +x
if output=$(vracli vidm); then
 identity_profile=vidm
 admin_client_id=$(echo "${output}" | jq -r '.clients|.ClientID')
 admin_client_secret=$(echo "${output}" | jq -r '.clients|.ClientSecret')
 vidm_client_id_user=$(echo "${output}" | jq -r '.clients|.ClientIDUser')
 org_owner=$(echo "${output}" | jq -r '.user')
 
 # Storing the prelude clients which will be used later in generate_client_ids.sh
 PRELUDE_CLIENTS="$admin_client_id,$vidm_client_id_user"
elif ldap=$(kubectl get vaconfigs.prelude.vmware.com prelude-vaconfig -o json | jq -e .spec.ldap); then
 echo "
#####################################################
# LDAP deployments are not meant for production use #
# and are not supported in HA environments! #
#####################################################
"
 identity_profile=ldap
 admin_client_id=$(echo $ldap | jq -r ".client_id")
 admin_client_secret=$(echo $ldap | jq -r ".client_secret")
 org_owner=$(echo $ldap | jq -r ".default_org_owner")
 
 PRELUDE_CLIENTS="$admin_client_id"
else
 echo "No vIDM configuration has been provided!"
 exit 1
fi
set +e

This code fragment implements intelligent detection of the available identity management system and automatically retrieves necessary authentication data:

  1. Disable Command Display – The set +x instruction prevents displaying sensitive information such as client IDs and secret keys.


  2. vIDM Detection and Configuration – The script first checks if vIDM (VMware Identity Manager) is configured by calling vracli vidm:


  • If the operation succeeds, it extracts client IDs and secret keys from the result
  • It also records the organization owner username
  • Initializes PRELUDE_CLIENTS as a list containing the administrator client ID and user ID
  1. Alternatively, LDAP Detection – If vIDM is not configured, the script checks if LDAP configuration is available:
  • Displays a warning that LDAP deployments are not intended for production use
  • Gets the client ID and secret key from LDAP configuration
  • Records the default organization owner name
  • Initializes PRELUDE_CLIENTS with only the administrator client ID
  1. Handling Case of Missing Configuration – If neither vIDM nor LDAP is configured, the script exits with an error, informing that no vIDM configuration was provided.


  2. Disable Strict Error Mode – The set +e instruction disables automatic script termination on errors, which is needed in subsequent operations that may normally return non-zero exit codes.


After detecting the identity system and retrieving credentials, the script uses this information to generate OAuth client IDs for all components:

/opt/scripts/generate_client_ids.sh "$PRELUDE_CLIENTS"

The generate_client_ids.sh script is a complex script that:

  1. Processes templates from the /opt/charts/client-secrets/templates/ directory
  2. Generates or retrieves client IDs for each service
  3. Creates a ConfigMap containing a list of all managed clients
csp_generate_client_ids() {
 local identity_managed_clients=
 local cached_client_ids=,${CLIENT_SECRETS_VALUES}
 
 for file in /opt/charts/client-secrets/templates/*; do
 # Skip files that are not service templates
 if [[ "$file" == *"csp-fixture-job.yaml" || "$file" == *"dependencies.yaml" || "$file" == *"NOTES.txt" || "$file" == *"_helpers.tpl" ]]; then
 continue
 }
 
 # Get service name and client ID prefix from template
 service_name=$(cat $file | grep -A 2 "metadata" | grep "name" | cut -d':' -f2 | sed -e 's/^[[:space:]]*//')
 service_client_id_prefix=$(cat $file | grep -A 2 "data" | grep "clientid" | cut -d':' -f2 | sed -e 's/^[[:space:]]*//' | cut -d'{' -f1)
 merged_service_name=$(echo $service_name | sed 's/-//g')
 
 # Check if ID is already cached
 if [[ "$cached_client_ids" == *",$merged_service_name="* ]]; then
 # Get existing client ID
 clientid=$(kubectl -n prelude get configmaps "$service_name" -o json | jq -r '.data.clientid')
 redirect_uri=$(kubectl -n prelude get configmaps "$service_name" -o json | jq -r '.data.redirecturi')
 # ... update redirect URI if changed ...
 else
 # Generate new client ID
 random_client_id_suffix=$(tr -dc 'a-zA-Z0-9' &lt; /dev/urandom | head -c 16)
 clientid=&quot;${service_client_id_prefix}${random_client_id_suffix}&quot;
 service_suffix_helm_key=&quot;${merged_service_name}clientsuffix&quot;
 CLIENT_SECRETS_VALUES=&quot;${CLIENT_SECRETS_VALUES},${merged_service_name}=${random_client_id_suffix}&quot;
 fi
 
 # Add client ID to managed clients list
 identity_managed_clients+=&quot;, $clientid&quot;
 done

 # Add predefined Prelude clients to the list
 if [[ &quot;$PRELUDE_CLIENTS&quot; &amp;&amp; &quot;$PRELUDE_CLIENTS&quot; != &#039;,&#039; ]]; then
 IFS=, read -r -a prelude_clients_array &lt;&lt;< "$PRELUDE_CLIENTS"
 for client in "${prelude_clients_array[@]}"; do
 if [[ -n "$client" ]] && [[ "$client" != ' ' ]]; then
 identity_managed_clients+=", $client"
 fi
 done
 fi

 # Create ConfigMap with client list
 kubectl -n "$NAMESPACE_PRELUDE" create configmap identity-clients 
 --from-literal=clients="${identity_managed_clients:2}" --dry-run=client -o yaml | kubectl apply -f -
}

This advanced function:

  1. Iterates through template files in /opt/charts/client-secrets/templates/
  2. For each template:
  • Extracts service name and client ID prefix
  • Checks if the client ID has already been generated and saved (cached)
  • If so, retrieves it from ConfigMap
  • If not, generates a new, random suffix and creates the full client ID
  • Updates the CLIENT_SECRETS_VALUES variable with the new value
  1. Adds all generated client IDs to the identity_managed_clients list
  2. Also adds predefined client IDs from PRELUDE_CLIENTS
  3. Creates the identity-clients ConfigMap containing a list of all clients

After generating client IDs, the deploy.sh script deploys all identity-related components:

log_stage "Deploying identity services"

# Deploy identity service charts
helm-upstall client-secrets "$CLIENT_SECRETS_VALUES" "$NAMESPACE_PRELUDE" && wait_release client-secrets

# Deploy the identity service if using LDAP (in vIDM case, identity service is not needed)
if [ "$identity_profile" = "ldap" ]; then
 helm-upstall identity "$VALUES" "$NAMESPACE_PRELUDE" && wait_release identity

 # Deploy openldap if using LDAP
 helm-upstall openldap "$VALUES" "$NAMESPACE_PRELUDE" && wait_release openldap
fi

This code fragment:

  1. Deploys the client-secrets chart, which contains OAuth client IDs
  2. In the case of using LDAP, also deploys the identity service (identity) and OpenLDAP
  3. Uses the wait_release function to wait until each chart is successfully deployed

The wait_release function monitors Helm release status:

wait_release() {
 local release=$1
 local timeout=${2:-300} # default 5 minutes
 
 echo "Waiting for release $release to be ready..."
 local start_time=$(date +%s)
 
 while true; do
 local status=$(helm status -n $NAMESPACE_PRELUDE $release -o json 2>/dev/null | jq -r '.info.status' 2>/dev/null)
 
 if [[ "$status" == "deployed" ]]; then
 echo "Release $release is ready"
 return 0
 fi
 
 local current_time=$(date +%s)
 local elapsed=$((current_time - start_time))
 
 if [[ $elapsed -gt $timeout ]]; then
 echo "Timeout waiting for release $release to be ready"
 return 1
 fi
 
 sleep 5
 done
}

After deploying identity services, the script registers endpoints for key components such as vRealize Orchestrator (vRO) and Action Based Extensibility (ABX):

log_stage "Registering service endpoints"

if [[ "$identity_profile" = "vidm" ]]; then
 /opt/scripts/register_vro_endpoint.sh
 if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
 /opt/scripts/register_abx_endpoint.sh
 fi
fi

The register_vro_endpoint.sh script contains advanced logic for detecting and configuring the vRO endpoint:

create_or_update_vro() {
 source /opt/scripts/retry_utils.sh
 set -o pipefail
 retry_backoff "5 15 45" "Failed to load existing vRO config" "load_existing_config"
 if [ ! -z "${CURRENT// }" ]
 then
 retry_backoff "5 15 45" "Failed to update existing vRO config" "update_existing"
 else
 retry_backoff "5 15 45" "Failed to register vRO" "register_vro"
 fi
 set +o pipefail
}

This function:

  1. Uses the retry_backoff mechanism to retry operations in case of temporary problems
  2. Tries to load existing vRO configuration
  3. If configuration exists, updates it
  4. Otherwise registers a new endpoint

Similarly, the register_abx_endpoint.sh script registers an endpoint for the Action Based Extensibility service:

create_or_update_abx() {
 source /opt/scripts/retry_utils.sh

 set -o pipefail

 retry_backoff "5 15 45 135 405 1215" "Failed to query number of ABX endpoints" "count_abx_endpoints"

 if [ "$ABX_ENDPOINTS_COUNT" -gt "0" ]
 then
 retry_backoff "5 15 45 135 405 1215" "Failed to update existing ABX config" "update_existing"
 else
 retry_backoff "5 15 45 135 405 1215" "Failed to register ABX endpoint" "register_abx_endpoint"
 fi

 set +o pipefail
}

This entire identity and authentication configuration phase:

  1. Automatically detects the available identity management system (vIDM or LDAP)
  2. Retrieves necessary credentials and client IDs
  3. Generates unique OAuth client IDs for all components
  4. Deploys identity and authentication related services
  5. Registers endpoints for key components

Thanks to these mechanisms, the deploy.sh script ensures:

  • A consistent and secure identity and authentication system
  • Uniqueness of OAuth client IDs
  • Secure storage and distribution of credentials
  • Proper endpoint registration for service integration
  • Resilience to temporary problems through retry mechanisms

This comprehensive implementation is the foundation of security and integration of all VMware Aria Automation platform components.

11. Endpoint Registration and Specialized Component Configuration

After configuring the basic infrastructure and identity system, the deploy.sh script proceeds to register endpoints and configure specialized components. This phase includes registering endpoints for components such as vRealize Orchestrator (vRO) and Action Based Extensibility (ABX), as well as configuring other key specialized services.

log_stage "Registering service endpoints"

if [[ "$identity_profile" = "vidm" ]]; then
 /opt/scripts/register_vro_endpoint.sh
 if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
 /opt/scripts/register_abx_endpoint.sh
 fi
fi

This code fragment makes endpoint registration dependent on the previously detected identity profile (vIDM or LDAP) and configuration:

  1. Conditional vRO Registration – The script calls register_vro_endpoint.sh only in the case of vIDM configuration, as vRO integration requires this system.


  2. Conditional ABX Registration – Additionally, if extension support is enabled (ENABLE_EXTENSIBILITY_SUPPORT), the script also registers the ABX endpoint.


The register_vro_endpoint.sh script contains advanced logic for detecting, building a host filter, and configuring the vRO endpoint:

#!/bin/bash

# Verify $CSP_AUTH_TOKEN defined
: ${CSP_AUTH_TOKEN:?}
# Verify INGRESS_URL defined
: ${INGRESS_URL:?}

PROVISIONING_URL="http://provisioning-service.prelude.svc.cluster.local:8282"

CERT="$(vracli certificate load-balancer --list || vracli certificate ingress --list)"
CERT_JSON=$(jq --null-input --compact-output --arg str "$CERT" '$str')

build_host_filter() {
 local nodeList=$(kubectl get nodes -o jsonpath='{.items[*].metadata.name}')
 declare -a nodeArray=($nodeList)

 local searchQuery="(nameeqembedded-VRO)or(endpointProperties.hostNameeq$INGRESS_URL:443)"

 for nodeName in "${nodeArray[@]}"; do
 if [ "${nodeName}" == "${FQDN}" ]
 then
 # single node
 echo "($searchQuery)"
 return
 else
 searchQuery+="or(endpointProperties.hostNameeqhttps://$nodeName:443)"
 fi
 done

 echo "($searchQuery)"
}

load_existing_config() {
 local hostFilter=$(build_host_filter)
 CURRENT=$(curl -k -f $PROVISIONING_URL"/provisioning/mgmt/endpoints?expand&external&$filter=((endpointTypeeqvro)and(customProperties.vroAuthTypeeqCSP)and$hostFilter)" 
 -H 'Authorization: Bearer '$CSP_AUTH_TOKEN 
 -H 'Cookie: csp-auth-token='$CSP_AUTH_TOKEN 
 |jq '.documents | .[] | .endpointProperties.certificate |= '"${CERT_JSON}"'| .endpointProperties.hostName |= "'$INGRESS_URL':443"')
}

update_existing() {
 curl -k -f -X PUT $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external 
 -H 'Content-Type: application/json' 
 -H 'Authorization: Bearer '$CSP_AUTH_TOKEN 
 -H 'Cookie: csp-auth-token='$CSP_AUTH_TOKEN 
 -d "${CURRENT}"
}

register_vro() {
 curl -k -f $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external 
 -H 'Content-Type: application/json' 
 -H 'Authorization: Bearer '$CSP_AUTH_TOKEN 
 -H 'Cookie: csp-auth-token='$CSP_AUTH_TOKEN 
 -d '{"endpointProperties":{"hostName":"'$INGRESS_URL':443","dcId":"0","privateKeyId":"vcoadmin","privateKey":"vcoadmin","certificate":'"${CERT_JSON}"',"acceptSelfSignedCertificate":true,"vroAuthType":"CSP"},"customProperties":{"isExternal":"true"},"endpointType":"vro","associatedEndpointLinks":[],"name":"embedded-VRO","tagLinks":[]}'
}

create_or_update_vro() {
 source /opt/scripts/retry_utils.sh
 set -o pipefail
 retry_backoff "5 15 45" "Failed to load existing vRO config" "load_existing_config"
 if [ ! -z "${CURRENT// }" ]
 then
 retry_backoff "5 15 45" "Failed to update existing vRO config" "update_existing"
 else
 retry_backoff "5 15 45" "Failed to register vRO" "register_vro"
 fi
 set +o pipefail
}

# Main execution
create_or_update_vro

This complex script performs the following tasks:

  1. Required Variable Verification – Checks if variables CSP_AUTH_TOKEN and INGRESS_URL are set, which is necessary for proper registration.


  2. Certificate Retrieval – Gets the load-balancer or ingress certificate and formats it as JSON.


  3. Host Filter Building – The build_host_filter function dynamically creates a search filter that includes:


  • An endpoint named “embedded-VRO”
  • Hosts with the ingress URL address
  • All cluster nodes (for multi-node environments)
  1. Loading Existing Configuration – The load_existing_config function checks if the vRO endpoint already exists, using the built filter.


  2. Update or Registration – Depending on the check result, the script either updates the existing endpoint (update_existing) or registers a new one (register_vro).


  3. Retry Mechanism – All operations use retry_backoff with various delays to handle temporary problems.


Similarly, the register_abx_endpoint.sh script registers an endpoint for Action Based Extensibility:

#!/bin/bash

# Verify $CSP_AUTH_TOKEN defined
: ${CSP_AUTH_TOKEN:?}

source /opt/scripts/csp_functions.sh

PROVISIONING_URL="http://provisioning-service.prelude.svc.cluster.local:8282"

# Define OpenFaaS properties
OPENFAAS_ADDRESS="http://gateway.openfaas.svc.cluster.local:8080"
echo "OpenFaaS address: "${OPENFAAS_ADDRESS}

count_abx_endpoints() {
 ABX_ENDPOINTS_COUNT=$(curl -k -f $PROVISIONING_URL'/provisioning/mgmt/endpoints?enumerate&external&$filter=(endpointTypeeqabx.endpoint)' 
 -H 'Content-Type: application/json' 
 -H 'Authorization: Bearer '$CSP_AUTH_TOKEN | jq .totalCount)
}

register_abx_endpoint() {
 curl -k -f $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external 
 -H 'Content-Type: application/json' 
 -H 'Authorization: Bearer '$CSP_AUTH_TOKEN 
 -d '{"endpointProperties":{"apiEndpoint":"'$OPENFAAS_ADDRESS'","privateKeyId":"","privateKey":""},"customProperties":{"isExternal":"true"},"endpointType":"abx.endpoint","associatedEndpointLinks":[],"name":"embedded-ABX-onprem","tagLinks":[]}'
}

create_or_update_abx() {
 source /opt/scripts/retry_utils.sh

 set -o pipefail

 retry_backoff "5 15 45 135 405 1215" "Failed to query number of ABX endpoints" "count_abx_endpoints"

 if [ "$ABX_ENDPOINTS_COUNT" -gt "0" ]
 then
 retry_backoff "5 15 45 135 405 1215" "Failed to update existing ABX config" "update_existing"
 else
 retry_backoff "5 15 45 135 405 1215" "Failed to register ABX endpoint" "register_abx_endpoint"
 fi

 set +o pipefail
}

# Main execution
create_or_update_abx

This script:

  1. Defines the OpenFaaS address (function engine used by ABX)
  2. Checks if the ABX endpoint already exists using count_abx_endpoints
  3. Depending on the result, updates the existing or registers a new endpoint
  4. Uses the retry_backoff mechanism with an aggressive retry scheme (up to 1215 seconds pause)

After registering endpoints, the deploy.sh script deploys other specialized components, such as VMware Event Broker Appliance (VEBA) and analytics services:

log_stage "Deploying specialized components"

# Deploy VEBA if extensibility support is enabled
if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
 helm-upstall veba "$VALUES,extensibilityEnabled=true" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
fi

# Deploy analytics components if enabled
if [[ "$ENABLE_ANALYTICS" == "true" ]]; then
 helm-upstall analytics-collector "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
 helm-upstall analytics-service "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
fi

This code fragment shows a “feature flags” approach, where specialized components are deployed only when appropriate flags (like ENABLE_EXTENSIBILITY_SUPPORT or ENABLE_ANALYTICS) are enabled.

The next stage is configuring the organization alias, which is key for multi-tenant environments:

log_stage "Configuring organization alias"

if [[ "$identity_profile" = "vidm" ]]; then
 source /opt/scripts/vidm_functions.sh
 source /opt/scripts/csp_functions.sh
 
 # Set up variables for CSP
 DEFAULT_ORG_NAME=$(get_default_tenant_name)
 DEFAULT_ORG_ALIAS=$(get_default_tenant_alias "$admin_token")
 
 # Update organization alias in identity service
 csp_auth "$admin_client_id" "$admin_client_secret"
 csp_retrieve_orgs
 patch_identity_with_default_org_alias
fi

This fragment:

  1. Loads functions for handling vIDM and CSP
  2. Gets the name and alias of the default organization (tenant)
  3. Authenticates with CSP using administrator client ID and secret
  4. Retrieves organization information
  5. Updates the organization alias in the identity service

The patch_identity_with_default_org_alias function implements intelligent alias updating:

patch_identity_with_default_org_alias () {
 local is_alias_updated=$(kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.vidm.isDefaultOrgAliasUpdated')

 if [ "$is_alias_updated" = true ]; then
 timestamped_echo "The default organization alias is already updated in identity service."
 return 0
 fi

 local alias=$(kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.vidm.defaultOrgAlias //empty')

 if [ -z "$alias" ]; then
 timestamped_echo "Default organization doesn't have an alias."
 else
 timestamped_echo "Updating the default organization alias in identity service."
 identity_patch_alias "$alias"
 fi

 kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/vidm/isDefaultOrgAliasUpdated", "value": true}]'
}

This function:

  1. Checks if the alias has already been updated
  2. If not, gets the alias from configuration
  3. Updates the alias in the identity service
  4. Sets the isDefaultOrgAliasUpdated flag to true to avoid multiple updates

This entire phase of endpoint registration and specialized component configuration ensures:

  1. Integration of key components, such as vRO and ABX, with the identity system
  2. Conditional deployment of specialized components (VEBA, analytics)
  3. Proper configuration of organization aliases for multi-tenant environments
  4. Resilience to temporary problems through retry mechanisms

Thanks to these mechanisms, the deploy.sh script ensures comprehensive configuration and integration of all specialized components, which is key to the full functionality of the VMware Aria Automation platform.

12. Service Toggling and State Management

After completing the deployment and configuration phase of individual components, the deploy.sh script proceeds to the key stage of service toggling, which ensures that all services are enabled and configured according to requirements. This phase is essential because some services may require special startup or configuration after deployment.

log_stage "Toggling services"

# Store list of enabled services in vaconfig
/opt/scripts/store_enabled_svc.sh

# Get service states from vaconfig
STATES=$(kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.services.states // "{}"')

# Toggle services with appropriate parameters
/opt/scripts/toggle_services.sh "$VALUES" "$APP_SELECTOR" "$TARGET_PORT" "$EXTRA_VALUES" "$STATES" "" "true" "$UPSTALL_STATUS_DIR"

This code fragment performs several key tasks:

  1. Saving List of Enabled Services – The store_enabled_svc.sh script collects information about services that should be enabled and saves it in the configuration object:
#!/bin/bash

set -eu

SERVICES_TO_TOGGLE="$(cat /etc/vmware-prelude/services.list) $(/opt/scripts/capsvc_enabled.sh)"
SERVICES_TO_TOGGLE="${SERVICES_TO_TOGGLE// /$'n'}"
SERVICES_TO_TOGGLE=$(echo "$SERVICES_TO_TOGGLE" | sort -u )

CAP_DISABLED_SERVICES="$(/opt/scripts/capsvc_disabled.sh)"
if [[ -n "$CAP_DISABLED_SERVICES" ]]; then
 CAP_DISABLED_SERVICES=(${CAP_DISABLED_SERVICES// / })
 for svc in "${CAP_DISABLED_SERVICES[@]}"
 do
 # remove the services, which should not be toggled, based on capability
 SERVICES_TO_TOGGLE=$(echo "$SERVICES_TO_TOGGLE" | sed "/^$svc$/d")
 done
fi
SERVICES_TO_TOGGLE="${SERVICES_TO_TOGGLE//$'n'/ }"

kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/services/enabled-services", "value": "'"$SERVICES_TO_TOGGLE"'"}]' || true

This script:

  • Combines services from the services.list file with services returned by capsvc_enabled.sh
  • Removes duplicates and sorts the list
  • Removes services that should be disabled (from capsvc_disabled.sh)
  • Saves the final list in the vaconfig object under the path /spec/services/enabled-services
  1. Getting Service States – The script retrieves information about service states from the configuration object:
STATES=$(kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.services.states // "{}"')

This command extracts a JSON object containing current states of all services or returns an empty object {} if state information is not available.

  1. Service Toggling – The most important stage is calling the toggle_services.sh script, which handles actual service toggling:
/opt/scripts/toggle_services.sh "$VALUES" "$APP_SELECTOR" "$TARGET_PORT" "$EXTRA_VALUES" "$STATES" "" "true" "$UPSTALL_STATUS_DIR"

This script accepts a number of parameters:

  • $VALUES – configuration value string
  • $APP_SELECTOR – application selector
  • $TARGET_PORT – target port
  • $EXTRA_VALUES – additional values
  • $STATES – JSON object with service state information
  • "" – empty string (exclusion of specific services)
  • "true" – force service restart
  • $UPSTALL_STATUS_DIR – directory for monitoring status

The toggle_services.sh script contains advanced logic for toggling services:

#!/bin/bash

set -e
set -x

VALUES=$1
APP_SELECTOR=$2
TARGET_PORT=$3
TOGGLES=$4
STATES=$5
EXCLUDE_SERVICES=$6
FORCE_SERVICES_RESTART=$7
UPSTALL_STATUS_DIR=$8

if [[ -z $STATES ]]; then
    STATES="{}"
fi

source /opt/scripts/helm_utils.sh

cd /opt/charts

export -f helm-toggle-state
export -f helm-upstall
export -f do-helm-upstall

source /opt/scripts/retry_utils.sh
export -f retry_backoff

allServices=$(kubectl get vaconfig prelude-vaconfig -o json | jq -ej '.spec.services."enabled-services"' | sed 's/s+/n/g')

if [[ -n "$EXCLUDE_SERVICES" ]]; then
 services_to_excludeList=(${EXCLUDE_SERVICES//,/ })

 for i in "${services_to_excludeList[@]}"
 do
 SERVICES_TO_TOGGLE=$(echo "$allServices" | sed "/^$i$/d")
 done
fi
SERVICES_TO_TOGGLE="${allServices//$'n'/ }"

for svc in ${allServices[@]}
do
 if [[ $(jq "has("$svc")" &lt;&lt;&lt; &quot;$STATES&quot;) == false ]]; then
 STATES=&quot;$(jq &quot;. + {&quot;$svc&quot;: true}&quot; &lt;&lt;< "$STATES")"
 fi
done

echo "$allServices" | xargs -t -n 1 -P 0 -I % bash -c "helm-toggle-state % '$VALUES' '$NAMESPACE_PRELUDE' '$APP_SELECTOR' '$TARGET_PORT' '$TOGGLES' '$STATES' '$
FORCE_SERVICES_RESTART' '$UPSTALL_STATUS_DIR'"

This extensive script performs the following operations:

  1. Variable Initialization – Gets parameters passed from the main deploy.sh script.


  2. Loading Helper Modules – Uses functions from helm_utils.sh and retry_utils.sh for Helm chart management and retry mechanisms.


  3. Getting Service List – Extracts the list of services to be toggled from the configuration object.


  4. Handling Exclusions – If a list of services to exclude was passed, removes them from the list of services to toggle.


  5. State Initialization – For each service that doesn’t have a defined state in the STATES object, adds a default state of true.


  6. Parallel Service Toggling – Uses the xargs command with the -P 0 option (no process limit) to run the helm-toggle-state function in parallel for each service.


The helm-toggle-state function is responsible for actual toggling of individual services:

helm-toggle-state() {
    local name=$1
    local values=$2
    local namespace=$3
    local app_selector=$4
    local target_port=$5
    local toggles=$6
    local states=$7
    local force_services_restart=$8
    local upstall_status_dir=$9

    local extra_values=""

    if echo $states | jq -e '."'$name'"' > /dev/null; then
        extra_values="disable=false,$toggles"
    else
        extra_values="disable=true,service.selector.app=$app_selector,service.port.targetPort=$target_port,$toggles"
    fi

 force_reinstall_flag=""

 if [[ "$force_services_restart" == "true" ]]; then
 force_reinstall_flag="--force-reinstall"
 fi

 local upstall_status_dir_arg=""
 if [ "$upstall_status_dir" != "" ]; then
 upstall_status_dir_arg="CHECK_DIR=${upstall_status_dir}"
 fi

    helm-upstall "$name" "$values" "$namespace" "$extra_values" "$force_reinstall_flag" "$upstall_status_dir_arg"
}

This function:

  1. Analyzes whether the service should be enabled or disabled based on the states object
  2. Sets appropriate values for Helm (disable=false or disable=true)
  3. Adds the --force-reinstall flag if service restart is required
  4. Calls the helm-upstall function with appropriate parameters

After toggling services, the script monitors their state:

log_stage "Waiting for services to start"

# Check status of all services
timeout 300s bash -c 'until vracli service status | grep -v "Running|Disabled|N/A"; do echo "Waiting for services..."; sleep 10; done'

# Verify that all required services are running
required_services="identity-service rabbitmq-ha orchestrator-service"
for service in $required_services; do
 if ! vracli service status | grep "$service" | grep -q "Running"; then
 log_stage "Service $service is not running. Deployment failed."
 exit 1
 fi
done

This fragment:

  1. Waits up to 300 seconds until all services reach the “Running”, “Disabled”, or “N/A” state
  2. Checks if key services (identity-service, rabbitmq-ha, orchestrator-service) are in the “Running” state
  3. If any of the key services is not running, ends the deployment with an error

The service toggling phase is key for several reasons:

  1. Activation Control – Allows selectively enabling or disabling services as needed
  2. Service Restart – Enables forcing service restart, which may be necessary after configuration changes
  3. Operation Verification – Ensures that all required services are active and working properly
  4. Parallel Execution – Speeds up the process through parallel operation execution
  5. Error Handling – Implements detection and reporting of service problems

Thanks to these mechanisms, the deploy.sh script ensures that all necessary services are properly started and configured, which is a key condition for the operation of the VMware Aria Automation platform.

13. Deployment Finalization and Ready State Setting

The final phase of the deploy.sh script includes operations finalizing the deployment, including cleaning of temporary resources, resetting database migration locks, setting the readiness flag, and notifying the user of successful deployment completion. This phase is crucial to ensure that the system is fully operational and ready to use.

# Force generation of new service status
vracli service status --ignore-cache || true

log_stage "Clearing liquibase locks"
vracli reset liquibase --confirm

# Set the deploy ready state and update generation for liagent lcc action
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/deploy/ready", "value": true},
 {"op": "replace", "path": "/spec/deploy/generation", "value": "'"$(date +%s)"'"}]'

# Final cleanup and success message
clear || true
echo
echo "Prelude has been deployed successfully"
echo

This code fragment performs several important tasks:

1. Forcing Generation of Current Service Status

vracli service status --ignore-cache || true

This command:

  • Calls vracli service status with the --ignore-cache option, which forces skipping the cache and retrieving the status of all services again
  • The || true operator ensures that the script will continue even if this command returns an error

Forcing service status refresh is important because:

  • It ensures the administrator receives the most up-to-date information about system state
  • It initializes the internal service status cache, which will speed up subsequent operations
  • It verifies that all services have been properly started

2. Clearing Liquibase Locks

log_stage "Clearing liquibase locks"
vracli reset liquibase --confirm

Liquibase is a tool used by many VMware Aria Automation components to manage database schema migrations. During migration, Liquibase establishes locks to prevent concurrent migrations that could damage the schema.

The vracli reset liquibase --confirm command:

  • Removes all Liquibase locks from databases
  • Requires confirmation (--confirm) to prevent accidental execution
  • Is crucial if a previous deployment was interrupted or ended with an error, which could leave locks behind

Clearing Liquibase locks is essential because:

  • Remaining locks can prevent future schema migrations
  • They can cause errors when starting services that use the migration mechanism
  • They ensure a clean state for future updates and configuration changes

3. Setting the Deployment Ready Flag

kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/deploy/ready", "value": true},
 {"op": "replace", "path": "/spec/deploy/generation", "value": "'"$(date +%s)"'"}]'

This command updates the prelude-vaconfig configuration object using a PATCH operation, making two changes:

  1. Sets the /spec/deploy/ready flag to true, signaling that deployment has completed and the system is ready to use
  2. Updates the /spec/deploy/generation value to the current Unix timestamp (number of seconds since January 1, 1970), which allows identifying the deployment version

Setting the readiness flag is crucial because:

  • It informs other system components that deployment has succeeded
  • It allows liagent agents to perform LCC (Life Cycle Configuration) actions
  • It serves as a reference point for monitoring and management tools
  • It provides a consistent way to check deployment state

4. Final Cleanup and Success Message

clear || true
echo
echo "Prelude has been deployed successfully"
echo

These commands:

  • Clear the screen (clear), to provide a good view of the final message (the || true operator ensures that the script will continue even if clear fails)
  • Display a clear message about successful deployment

The success message is important because:

  • It gives the user unambiguous confirmation that deployment has succeeded
  • It constitutes a clear end to the deployment process
  • It serves as a reference point in logs

This entire finalization phase ensures that:

  1. The system is in a ready-to-use state
  2. There are no remaining locks or unfinished operations
  3. The configuration is properly updated with the readiness flag
  4. The user receives clear confirmation of success

It’s worth noting that the script doesn’t end immediately after the success message but returns to the on_exit function registered at the beginning via trap on_exit EXIT. This function performs final cleanup operations:

on_exit() {
 # ... error handling code ...

 # Remove temporary helm_upstall check directory
 if [ -n "$UPSTALL_STATUS_DIR" ]; then
 clear-helm-upstalls-status $UPSTALL_STATUS_DIR true
 fi

 # Clear the value of property cache.timeout in vracli.conf file
 # Do not generate new service status
 vracli service status --unset-config service.status.cache.lifetime || true

 rm -rf /tmp/deploy.tmp.*
}

These operations ensure that:

  • Temporary Helm status files are removed
  • Service status cache configuration is restored to default values
  • All temporary files are cleaned up

Thanks to this comprehensive finalization phase, the deploy.sh script ensures that the system is left in a clean, consistent, and ready-to-use state, which is key to stable operation of the VMware Aria Automation platform.

Security and Advanced Credential Management

The deploy.sh script implements an extensive security and credential management system that is fundamental to ensuring protection of the entire VMware Aria Automation platform. This section analyzes how the script generates, stores, and distributes different types of credentials and how it ensures secure communication between components.

Password Generation and Management

One of the key aspects of platform security is generating strong, unique passwords for various components. The script uses specialized tools to create such credentials:

credential_add_from_command "postgres" /opt/scripts/generate_pass.sh
credential_add_from_command "redis" /opt/scripts/generate_pass.sh
credential_add_from_command "lemans-resources-db" /opt/scripts/generate_pass.sh

The credential_add_from_command function executes the given command and adds its result as a credential for a specified key:

credential_add_from_command() {
 local key=$1
 shift

 if [ "$1" == "--force" ]; then
 shift
 elif credential_exists "$key"; then
 return 0
 fi

 value=$("$@" | base64 -w 0)
 kubectl patch secret db-credentials -n prelude --type=json -p="[{"op":"add", "path":"/data/$key", "value":"$value"}]"
}

This function:

  1. Checks if the credential already exists (unless the --force option is used)
  2. Calls the specified password-generating command
  3. Base64 encodes the result
  4. Adds the encoded value to the db-credentials Kubernetes secret

The generate_pass.sh script creates a 32-character random alphanumeric string:

#!/bin/bash
tr -dc 'a-zA-Z0-9' < /dev/urandom | head -c 32

This password generator:

  1. Uses the /dev/urandom device as an entropy source, ensuring cryptographic quality randomness
  2. Filters the stream, keeping only alphanumeric characters
  3. Takes the first 32 characters, giving a 32-character password
  4. Contains no predictable patterns or constant values

This approach ensures that each deployment has unique, strong passwords, which is key to platform security.

Generating Different Types of Cryptographic Keys

Besides passwords, the script generates various types of cryptographic keys that are used for different purposes:

# EncoderSalt key (32 bytes in base64)
credential_add_from_command "identity-encoder-salt" /opt/scripts/generate_encryption_key_base64.sh 32

# Encryption key (48 bytes)
credential_add_from_command "key" /opt/scripts/generate_encryption_key.sh 48

# RSA key for JWT (2048 bits)
credential_add_from_command "rsaKey" /opt/scripts/generate_rsa_encryption_key.sh 2048

# JSON objects with keys
credential_add_from_command "encryption-keys.json" bash -c 'echo "{"primary":1,"keys":[{"version":1,"value":"$(/opt/scripts/generate_encryption_key_base64.sh 32)"}]}"'

Each key type is generated using a specialized script:

  1. Base64-encoded keys (generate_encryption_key_base64.sh):
    #!/bin/bash
    /usr/bin/openssl rand "$1" -base64

    This script:
  • Uses OpenSSL to generate a random byte string of specified length
  • Encodes the result in base64, giving a text representation
  • Is used for keys that must be stored in text format
  1. Raw binary keys (generate_encryption_key.sh):
    #!/bin/bash
    /usr/bin/openssl rand "$1"

    This script:
  • Generates a random byte string without encoding
  • Is used for keys that will be processed internally in binary format
  1. RSA keys (generate_rsa_encryption_key.sh):
    #!/bin/bash
    /usr/bin/openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:"$1"

    This script:
  • Generates an RSA key pair of specified length (e.g., 2048 bits)
  • Returns the private key in PEM format
  • Is used to generate keys for signing JWT tokens, asymmetric encryption, and other purposes
  1. Complex JSON objects with keys:
    echo "{"primary":1,"keys":[{"version":1,"value":"$(/opt/scripts/generate_encryption_key_base64.sh 32)"}]}"

    This construction:
  • Creates a JSON object containing an encryption key
  • Adds metadata such as primary key ID and version
  • Is used by components that implement key rotation

Different key types are used for different purposes:

  • Salt keys are used in hashing processes
  • Symmetric keys are used for encrypting data at rest
  • RSA keys are used for asymmetric encryption and token signing
  • JSON objects with keys are used by components handling key rotation

Secure Credential Storage

All generated credentials are securely stored in Kubernetes Secrets and then saved in the vaconfig object for persistence between deployments:

credentials_save() {
 secrets=$(kubectl get secret db-credentials -n prelude -o yaml | base64)
 crdssecret=$(echo $secrets | tr -d 'n')
 kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/crdssecret", "value": "'"$crdssecret"'"}]'
}

This function:

  1. Gets the entire db-credentials secret in YAML format
  2. Encodes it in base64
  3. Removes newline characters to get a uniform string
  4. Saves this string in the vaconfig object under the /spec/crdssecret path

During redeployment, credentials are restored from the vaconfig object:

credentials_load() {
 tmpfile=$(mktemp)
 kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.crdssecret' | base64 -d > $tmpfile
 if [ ! -s $tmpfile ]; then
 rm -f $tmpfile
 kubectl -n prelude create secret generic db-credentials --from-literal=postgres=change_me
 else
 kubectl apply -f $tmpfile
 rm -f $tmpfile
 fi
}

This function:

  1. Creates a temporary file
  2. Gets the encoded crdssecret value from the vaconfig object
  3. Decodes it from base64 and writes to the file
  4. If the file is empty (no saved credentials), creates a basic secret
  5. Otherwise applies the saved secret

This mechanism ensures:

  • Credential persistence between deployments
  • Abstraction layer through Kubernetes Secrets usage
  • Secure storage of sensitive data

SSL Certificate Management

The deploy.sh script implements a comprehensive SSL certificate management system, which is key to secure communication between components:

log_stage "Applying ingress certificate"
/opt/scripts/prepare_certs.sh
/opt/scripts/apply_certs.sh

The prepare_certs.sh script generates a self-signed certificate if one doesn’t exist:

#!/bin/bash
set -e

# Generate self-signed certificate for ingress if such does not exist
vracli certificate ingress --list &>/dev/null || vracli certificate ingress --generate auto --set stdin

The apply_certs.sh script installs certificates in the Kubernetes cluster:

#!/bin/bash

CERT_INGRESS_PEM=$(mktemp --suffix=ingress.pem)
CERT_INGRESS_KEY=$(mktemp --suffix=ingress.key)
CERT_FQDN_PEM=$(mktemp --suffix=fqdn.pem)
CERT_PROXY_PEM=$(mktemp --suffix=proxy.pem)

# ... removing existing secrets ...

vracli certificate ingress --list-key > $CERT_INGRESS_KEY
vracli certificate ingress --list > $CERT_INGRESS_PEM

kubectl create secret tls cert-ingress 
 --cert=${CERT_INGRESS_PEM} 
 --key=${CERT_INGRESS_KEY} 
 -n ingress || exit $?

rm -f ${CERT_INGRESS_KEY}

# ... handling load-balancer and proxy certificates ...

This script:

  1. Creates temporary files for certificates
  2. Gets the ingress certificate and its private key
  3. Creates a TLS secret in the ingress namespace
  4. Immediately removes the file with the private key
  5. Handles load-balancer and proxy certificates

Additionally, certificates are used during endpoint registration:

CERT="$(vracli certificate load-balancer --list || vracli certificate ingress --list)"
CERT_JSON=$(jq --null-input --compact-output --arg str "$CERT" '$str')

# ... later in code ...
-d '{"endpointProperties":{"hostName":"'$INGRESS_URL':443","dcId":"0","privateKeyId":"vcoadmin","privateKey":"vcoadmin","certificate":'"${CERT_JSON}"',"acceptSelfSignedCertificate":true,"vroAuthType":"CSP"}...

This fragment:

  1. Gets the load-balancer certificate, or if it doesn’t exist, uses the ingress certificate
  2. Converts it to JSON format
  3. Includes it in endpoint registration data

The certificate management system ensures:

  • Automatic certificate generation if they don’t exist
  • Secure certificate storage in Kubernetes secrets
  • Immediate removal of sensitive data (private keys) after use
  • Consistent certificate usage throughout the system

Sensitive Data Protection Mechanisms

The deploy.sh script implements several mechanisms ensuring protection of sensitive data:

  1. Disabling Command Display – Before operations on sensitive data:


    set +x

    This instruction disables displaying executed commands, preventing password and key disclosure in logs.


  2. Using Temporary Files with Automatic Removal – For operations requiring files:


    tmpfile=$(mktemp)
    # ... operations on file ...
    rm -f $tmpfile


    This pattern ensures that sensitive data is immediately removed after use.


  3. Base64 Encoding – For storing data in Kubernetes objects:


    value=$(“$@” | base64 -w 0)

    Base64 encoding protects against accidental sensitive data disclosure in logs and debugging.


  4. Private Key Protection – After creating secrets with keys:


    rm -f ${CERT_INGRESS_KEY}

    Immediate removal of files with private keys minimizes their exposure risk.


  5. Storing Secrets in Dedicated Kubernetes Objects:


    kubectl -n prelude create secret generic db-credentials ...

    Using Kubernetes Secrets provides an additional protection layer, including:


  • Access control at RBAC level
  • Optional encryption at rest
  • Limited pod access

This entire comprehensive security and credential management system ensures:

  • Strong, unique passwords and keys for each deployment
  • Various cryptographic key types tailored to specific needs
  • Secure credential storage and distribution
  • Sensitive data protection throughout the lifecycle
  • Automatic SSL certificate installation and configuration

Thanks to these mechanisms, the deploy.sh script establishes solid security foundations for the entire VMware Aria Automation platform, protecting both data and communication between components.

Component Architecture and Their Dependencies

The deploy.sh script manages a comprehensive ecosystem of cooperating components that together form the VMware Aria Automation platform. This section analyzes the architecture of these components, their roles, and the dependencies between them.

Infrastructure Components

The foundation of the platform are infrastructure components that provide basic services for other system elements:

1. Kubernetes – Container Orchestration Platform

Kubernetes forms the basis of the entire architecture, providing:

  • Container orchestration (pods, deployments, statefulsets)
  • Network management (services, ingress)
  • Persistent data storage (persistent volumes)
  • Configuration management (configmaps, secrets)
  • Automatic recovery after failures

The deploy.sh script heavily uses the Kubernetes API:

kubectl create namespace "${NAMESPACE_PRELUDE}"
kubectl apply -f ...
kubectl patch vaconfig prelude-vaconfig ...
kubectl get pods ...

2. Helm – Package Manager for Kubernetes

Helm is used to define, install, and update Kubernetes components:

helm-upstall db-settings "" "${NAMESPACE_PRELUDE}"
helm-upstall identity "$VALUES" "$NAMESPACE_PRELUDE"
helm-toggle-state "$name" "$values" "$namespace" "$extra_values"

The helm-upstall function is an advanced wrapper around helm upgrade/install that ensures operation idempotency.

3. PostgreSQL – Database System

PostgreSQL serves as the data storage layer for most platform components. The script handles two configurations:

  • Single-DB: One PostgreSQL instance serving all services
  • Multi-DB: Dedicated instances for each service, providing better isolation and performance
function deploy_databases()
{
 local multi_db="$1"
 # ...
 if [[ "$multi_db" == false ]]
 then
 databases=("postgres")
 else
 # Getting database list for each service
 databases=$(kubectl get configmap db-settings -n ${namespace} -o json | jq -r ".data| keys[]"| grep -v "postgres" | grep -v "repmgr")
 fi
 # ...
}

Databases are deployed with replication for high availability, with automatic primary-node detection:

function get_primaries()
{
 database_pods=$(kubectl get pods -n prelude -o custom-columns=:metadata.name,:spec.containers[0].image | grep db-image | cut -d " " -f1 | grep "0")
 for pod in ${database_pods[@]}
 do
 pod_data=$(kubectl exec -n prelude ${pod} -- bash -c "chpst -u postgres repmgr node check --upstream 2>/dev/null")
 # ... primary node detection ...
 done
}

4. RabbitMQ – Message Queuing System

RabbitMQ provides asynchronous communication between components, which is key for microservice architecture:

helm-upstall rabbitmq-ha "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &

RabbitMQ configuration includes:

  • Credential generation:
    credential_add_from_command "rabbitmq" /opt/scripts/generate_pass.sh
    credential_add_from_command "rabbitmqConfig" /opt/scripts/generate_rmq_config.sh "$(credential_get "rabbitmq")"
    credential_add_from_command "rabbitmq-erlang-cookie" /opt/scripts/generate_pass.sh

  • Configuration as a high-availability cluster
  • Definition of virtual hosts, users, and permissions

5. LDAP/vIDM – Identity Management Systems

The platform supports two identity management systems:

  • vIDM (VMware Identity Manager) – preferred for production environments
  • OpenLDAP – used in simpler deployments, not recommended for production
if output=$(vracli vidm); then
 identity_profile=vidm
 # ... vIDM configuration ...
elif ldap=$(kubectl get vaconfigs.prelude.vmware.com prelude-vaconfig -o json | jq -e .spec.ldap); then
 echo "
#####################################################
# LDAP deployments are not meant for production use #
# and are not supported in HA environments! #
#####################################################
"
 identity_profile=ldap
 # ... LDAP configuration ...
fi

The choice of identity system affects many subsequent operations, such as endpoint registration and service configuration.

Service Components

Based on the infrastructure, numerous service components operate, providing platform functionality:

1. Ingress – Incoming Traffic Management

Ingress Controller manages incoming traffic to the platform:

k8s_create_namespace "${NAMESPACE_INGRESS}"
kubectl create secret tls cert-ingress --cert=${CERT_INGRESS_PEM} --key=${CERT_INGRESS_KEY} -n ingress

It provides:

  • SSL/TLS termination
  • Host-name and path-based routing
  • Load balancing

2. Identity Service – Identity and Authorization Service

Identity Service is the central authentication and authorization point:

helm-upstall identity "$VALUES" "$NAMESPACE_PRELUDE" && wait_release identity

Its responsibilities include:

  • User authentication
  • Session management
  • OAuth token handling
  • vIDM or LDAP integration

3. Provisioning Service – Resource Provisioning Service

Provisioning Service manages endpoints and resources:

PROVISIONING_URL="http://provisioning-service.prelude.svc.cluster.local:8282"
# ... later in code ...
curl -k -f $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external 
 -H 'Content-Type: application/json' 
 -H 'Authorization: Bearer '$CSP_AUTH_TOKEN 
 # ...

It is responsible for:

  • Endpoint registration and management
  • Resource provisioning
  • Communication with external systems

4. vRealize Orchestrator (vRO) – Orchestration Engine

vRO enables orchestration of complex processes:

/opt/scripts/register_vro_endpoint.sh

This component:

  • Executes workflows
  • Integrates with external systems
  • Provides API for automation

5. Action Based Extensibility (ABX) – Extension Mechanism

ABX provides serverless computing functions for the platform:

if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
 /opt/scripts/register_abx_endpoint.sh
fi

It’s based on OpenFaaS:

OPENFAAS_ADDRESS="http://gateway.openfaas.svc.cluster.local:8080"

And enables:

  • Function execution in response to events
  • Platform extension without modifying core code
  • Integration with external systems

6. Adapter Host Service – Adapter Hosting Service

if [[ "$ENABLE_ADAPTER_HOST_SVC" == "true" ]]; then
 VALUES="$VALUES,enableAdapterHostSvc=true"
else
 VALUES="$VALUES,enableAdapterHostSvc=false"
fi

This service:

  • Hosts integration adapters
  • Serves as a bridge between the platform and external systems
  • Isolates integration logic

Helper Components

In addition to main infrastructure and service components, the platform includes helper tools:

1. CSP (Cloud Services Platform) – Cloud Services Platform

CSP is used to manage identity, organizations, and services:

source /opt/scripts/csp_functions.sh
csp_auth "$admin_client_id" "$admin_client_secret"
csp_retrieve_orgs

This component:

  • Manages OAuth clients
  • Handles organizations and services
  • Provides API for administrative operations

2. Liquibase – Database Schema Migration System

Liquibase automates database schema management:

log_stage "Clearing liquibase locks"
vracli reset liquibase --confirm

It provides:

  • Schema versioning
  • Controlled migrations
  • Safe data structure updates

3. etcd – Cluster Configuration Storage System

etcd stores cluster and application configuration:

vracli proxy update-etcd

It is used for:

  • Storing Kubernetes configuration
  • Distributing settings between components
  • Tracking service states

4. Health Check – System State Checking Mechanisms

The Health Check system monitors platform state:

wait_deploy_health() {
 while true; do
 /opt/health/run-once.sh deploy && break || sleep 5
 done
}

It provides:

  • Checking status of all components
  • Problem detection
  • System readiness tracking

Component Dependency Diagram

Dependencies between components can be represented as follows:

                             +-------------+
                             |             |
                             | Kubernetes  |
                             |             |
                             +------+------+
                                    |
                  +----------------+|+----------------+
                  |                 |                 |
          +-------v------+  +------v-------+  +------v-------+
          |              |  |              |  |              |
          | PostgreSQL   |  |  RabbitMQ    |  | etcd         |
          |              |  |              |  |              |
          +-------+------+  +------+-------+  +------+-------+
                  |                |                 |
                  |                |                 |
          +-------v------+  +------v-------+  +------v-------+
          |              |  |              |  |              |
          | Identity     |  | CSP          |  | Health Check |
          | Service      |  |              |  |              |
          +-------+------+  +------+-------+  +--------------+
                  |                |
                  |                |
          +-------v------+  +------v-------+
          |              |  |              |
          | Provisioning |  | vRO          |
          | Service      |  |              |
          +-------+------+  +------+-------+
                  |                |
                  |                |
          +-------v------+  +------v-------+
          |              |  |              |
          | ABX          |  | Adapter Host |
          |              |  | Service      |
          +--------------+  +--------------+

This architecture shows:

  1. The fundamental role of Kubernetes as the base platform
  2. The dependency of all components on infrastructure (PostgreSQL, RabbitMQ, etcd)
  3. The key role of Identity Service and CSP for other services
  4. The higher layers of specialized services (ABX, Adapter Host)

Communication Flow Between Components

Communication between components relies on several mechanisms:

  1. REST API – Used by most services:
    “`bash
    curl -k -f $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external

-H ‘Content-Type: application/json’ -H ‘Authorization: Bearer ‘$CSP_AUTH_TOKEN

…


2. **RabbitMQ Message Queues** - For asynchronous communication:
```bash
credential_add_from_command "rabbitmqConfig" /opt/scripts/generate_rmq_config.sh "$(credential_get "rabbitmq")"
  1. Kubernetes Secrets – For credential distribution:


    kubectl -n prelude create secret generic db-credentials ...

  2. Kubernetes ConfigMaps – For configuration distribution:
    “`bash
    kubectl -n “$NAMESPACE_PRELUDE” create configmap identity-clients


–from-literal=clients=”${identity_managed_clients:2}”


5. **etcd** - For storing and sharing configuration:
```bash
vracli proxy update-etcd

This complex architecture of components and their dependencies is precisely managed by the deploy.sh script, which ensures proper deployment order, configuration, and integration of all elements. Thanks to this, the VMware Aria Automation platform operates as a cohesive, integrated system, despite its internal complexity and modular structure.

Architectural Patterns and Best Practices

The deploy.sh script implements numerous architectural patterns and best practices that ensure reliability, security, and flexibility of the deployment process. This section analyzes key patterns and practices used in the script, which constitute a valuable knowledge source for administrators and developers.

1. Idempotency

Idempotency is a property of operations that can be performed multiple times without changing the result after the first application. The deploy.sh script implements this pattern at many levels:

Idempotent Kubernetes Namespace Creation:

function k8s_create_namespace() {
 local ns="$1"
 if [[ $(kubectl get namespaces --no-headers | cut -f 1 -d ' ' | grep -x "$ns" | wc -l) == 0 ]]; then
 kubectl create namespace "$ns"
 fi
}

This function checks if a namespace already exists and creates it only if it’s missing, allowing multiple calls without errors.

Idempotent Helm Operations:

helm-upstall() {
 # ... initialization code ...
 
 /opt/scripts/helm-upstall --namespace="$3" --release-name="$release_name" --chart-path="$service_name" --set-string="$2" --set="$4" --timeout="$6" $5 || result=$?
 
 # ... result handling code ...
}

The helm-upstall function combines upgrade and install operations, ensuring that a chart will be installed if it doesn’t exist, or updated if it already exists.

Idempotent Credential Generation:

credential_add_from_command() {
 local key=$1
 shift

 if [ "$1" == "--force" ]; then
 shift
 elif credential_exists "$key"; then
 return 0
 fi

 # ... credential generation and addition ...
}

This function checks if a credential already exists and generates a new one only if it’s missing or forcing is enabled (the --force option).

Benefits of idempotency:

  • Ability to safely repeat deployment operations
  • Resilience to interruptions and script restarts
  • Ease of fixing partially completed deployments
  • Reduction of errors during updates

2. Separation of Concerns

The deploy.sh script implements the separation of concerns pattern, dividing functionality into independent, specialized modules:

Using Specialized Helper Scripts:

source /opt/scripts/persistence_utils.sh
source /opt/scripts/db_utils.sh
/opt/scripts/prepare_certs.sh
/opt/scripts/apply_certs.sh
/opt/scripts/generate_credentials.sh
/opt/scripts/register_vro_endpoint.sh

Each script focuses on one specific task, which improves readability, simplifies maintenance, and enables code reuse.

Structural Logic Organization:

log_stage "Creating kubernetes namespaces"
# ... namespace creation ...

log_stage "Applying ingress certificate"
# ... certificate handling ...

log_stage "Deploying infrastructure services"
# ... service deployment ...

The script is organized into logical sections, each with a clear purpose, which facilitates understanding the flow and debugging.

Using Functions to Encapsulate Logic:

function k8s_delete_namespace() {
 # ... complex namespace deletion logic ...
}

function backup_db_before_destroy() {
 # ... database backup logic ...
}

Defining functions that encapsulate complex logic improves modularity, readability, and possibility of code reuse.

Benefits of separation of concerns:

  • Easier understanding and code maintenance
  • Ability to test and develop individual components independently
  • Better dependency management
  • Flexibility in adapting or replacing modules

3. Error Handling

The deploy.sh script implements layered and resilient error handling that ensures deployment process reliability:

Signal Traps:

trap on_exit EXIT

The on_exit function is called when the script ends (regardless of the reason), ensuring proper cleanup and diagnostics even in case of failure.

Controlled Termination:

die() {
 local msg=$1
 local exit_code=$2

 if [ $# -lt 2 ]; then
 exit_code=1
 fi

 set +x
 clear || true
 echo $msg
 exit $exit_code
}

The die function ensures controlled termination with an informative message and appropriate exit code.

Selective Error Ignoring:

vracli ntp show-config || true
kubectl patch vaconfig prelude-vaconfig --type json -p '[...] || true

The || true operator ensures that the script will continue even if certain commands end with an error.

Retry Mechanisms:

retry_backoff "5 15 45" "Failed to load existing vRO config" "load_existing_config"

The retry_backoff function repeatedly tries to perform an operation with increasing delays, ensuring resilience to temporary problems.

Benefits of advanced error handling:

  • Increased deployment process reliability
  • Better user experience due to informative messages
  • Automatic diagnostics and log package generation
  • Resilience to temporary infrastructure problems

4. Automation

The deploy.sh script is an excellent example of complete automation of a complex deployment process:

Automatic Detection and Configuration:

if output=$(vracli vidm); then
 identity_profile=vidm
 # ... vIDM configuration ...
elif ldap=$(kubectl get vaconfigs.prelude.vmware.com prelude-vaconfig -o json | jq -e .spec.ldap); then
 identity_profile=ldap
 # ... LDAP configuration ...
fi

The script automatically detects the available identity system and adjusts further actions.

local database_directories=(/data/db/p-*)
if [[ -d "/data/db/live" ]]
then
 multi_db_previous=false
elif [[ -d "${database_directories[0]}/live" ]]
then
 multi_db_previous=true
fi

Automatically detects previous database configuration and initiates migration if needed.

Parallel Operation Execution:

helm-upstall endpoint-secrets "..." "$NAMESPACE_PRELUDE" &
helm-upstall no-license "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &
helm-upstall rabbitmq-ha "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &

The & symbol runs processes in the background, allowing parallel execution of independent operations and significantly speeding up deployment.

echo "$allServices" | xargs -t -n 1 -P 0 -I % bash -c "helm-toggle-state % '$VALUES' '$NAMESPACE_PRELUDE' ..."

Using xargs -P 0 enables parallel operations for multiple services.

Benefits of automation:

  • Elimination of human errors
  • Significant deployment process acceleration
  • Repeatability and consistency of deployments
  • Possibility of integration with CI/CD systems

5. Configuration Flexibility

The deploy.sh script offers many mechanisms for adapting the deployment process to different needs:

Command-Line Parameters:

displayHelp() {
 echo "Deploy or re-deploy all Prelude services"
 echo ""
 echo "Usage:"
 echo "./deploy.sh [Options]"
 echo ""
 echo "Options:"
 echo "-h --help Display this message."
 echo "--deleteDatabases Delete postgres databases of all services."
 # ... many other options ...
}

An extensive command-line option system allows detailed customization of the deployment process.

System Profiles:

for profile in "$PRELUDE_PROFILE_ROOT"/*; do
 export PRELUDE_PROFILE_PATH="$profile"
 profile_name="${profile##*/}"

 if "$profile"/check; then
 echo "Profile $profile_name: enabled" >&2
 # ... profile application ...
 fi
done

The system profile mechanism allows extending functionality without modifying the main script.

Feature Flags:

if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
 /opt/scripts/register_abx_endpoint.sh
fi

if [[ "$ENABLE_ANALYTICS" == "true" ]]; then
 helm-upstall analytics-collector "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
 helm-upstall analytics-service "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
fi

Feature flags allow enabling or disabling specific components and functionalities.

Benefits of configuration flexibility:

  • Adapting deployment to different environments
  • Enabling or disabling specific features
  • Possibility of extending functionality without modifying the main script
  • Ease of testing new components

6. Security

The deploy.sh script implements numerous security mechanisms:

Generating Unique, Strong Credentials:

credential_add_from_command "postgres" /opt/scripts/generate_pass.sh
credential_add_from_command "identity-encoder-salt" /opt/scripts/generate_encryption_key_base64.sh 32
credential_add_from_command "rsaKey" /opt/scripts/generate_rsa_encryption_key.sh 2048

Each component receives unique, cryptographically strong credentials.

Secure Sensitive Data Storage:

credentials_save() {
 secrets=$(kubectl get secret db-credentials -n prelude -o yaml | base64)
 crdssecret=$(echo $secrets | tr -d 'n')
 kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/crdssecret", "value": "'"$crdssecret"'"}]'
}

Credentials are stored in Kubernetes secrets.

Immediate Sensitive File Removal:

rm -f ${CERT_INGRESS_KEY}
rm -rf $SSH_DIR

Files containing private keys are removed immediately after use.

Sensitive Data Display Control:

set +x
# ... credential operations ...
set -x

Turning off debug mode during sensitive data operations.

Benefits of built-in security mechanisms:

  • Sensitive data protection
  • Unique credentials for each deployment
  • Secure storage and distribution of secrets
  • Minimizing data exposure risk

These architectural patterns and best practices make the deploy.sh script not only an effective deployment tool but also a valuable source of knowledge about advanced automation techniques, security, and complex system management. Many of these patterns can be applied to other projects, not necessarily related to VMware Aria Automation.

Summary

The deploy.sh script in VMware Aria Automation is an extremely advanced tool that serves as an excellent example of best practices in automation, DevOps, and infrastructure management. Its in-depth analysis reveals how modern, complex systems can be deployed in a reliable, secure, and flexible manner.

A deeper understanding of the deploy.sh script operation not only helps more efficiently manage the VMware Aria Automation environment but also constitutes a valuable lesson in advanced automation techniques, complex system management, and DevOps best practice implementation. Many patterns and approaches used in it can be adapted to other projects and systems, especially those based on containerization and microservices.

In a world where IT infrastructure complexity constantly grows, tools like deploy.sh become not just useful but essential for ensuring reliability, security, and operational efficiency of modern technology platforms.

Share with:


Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • From Commit to Cluster: Mastering GitOps with Argo CD on VMware Cloud Foundation
  • The Full Power of VCF Automation in Action: How I Connect the Dots and Build a Multi-Tier App with Kubernetes Objects.
  • From Code to Kubernetes Cluster with Chiselled Ubuntu Images on VMware
  • From Zero to Database-as-a-Service: A Deep Dive into VMware Data Services Manager 9.0 and VCF Automation
  • Complete Guide: Configuring SSO in VMware Cloud Foundation with Active Directory and VCF Automation Integration

Archives

Follow Me!

Follow Me on TwitterFollow Me on LinkedIn

GIT

  • GITHub – vWorld GITHub – vWorld 0
© 2026 vWorld | Powered by Superbs Personal Blog theme