Introduction to the deploy.sh Script
The deploy.sh script is a fundamental tool in the VMware Aria Automation ecosystem (formerly vRealize Automation), responsible for deploying, configuring, and managing all components of this advanced environment. Located in the /opt/scripts/
directory on the Aria Automation virtual machine, it serves as the central orchestration point for the entire system.
The main tasks of the deploy.sh script include:
Environment Initialization – The script prepares all necessary Kubernetes resources, namespaces, and infrastructure components. It creates the basic data structures that will be used by other system components. This stage is fundamental to the entire process as it establishes the foundation on which subsequent application layers will be built.
Component Configuration and Deployment – Deploy.sh manages the installation and configuration of dozens of microservices that together form the Aria Automation ecosystem. Each component has specific configuration requirements, dependencies, and runtime parameters that the script manages in an automated manner. The script uses Helm Charts to describe and deploy each component, ensuring repeatability and reliability of the process.
Database Management – The script coordinates the deployment, configuration, and migration of PostgreSQL databases. It handles both scenarios with a single database instance (single-DB) and more advanced configurations with dedicated databases for each service (multi-DB). Automated backup mechanisms, data migration, and integrity verification ensure data security during updates or system reconfiguration.
Distributed Service Synchronization – In microservice architecture, proper synchronization and communication between components is critical. The deploy.sh script manages the configuration of communication systems (RabbitMQ), service registration, endpoint establishment, and routing configuration, ensuring a cohesive ecosystem of cooperating services.
Comprehensive Lifecycle Management – Deploy.sh not only deploys new instances but also manages the full application lifecycle – from initial installation, through updates and reconfigurations, to controlled shutdown. The script implements idempotent operations that can be safely repeated without risk of system damage.
The deployment process using the deploy.sh script typically begins after the Aria Automation virtual machine starts. The administrator executes the script, which initiates the installation process for all necessary services and components, using Kubernetes as the container orchestration platform and Helm as the package manager.
In situations requiring environment restart, configuration cleanup, or data migration, the deploy.sh script offers a range of configuration options. For example, when it’s necessary to stop services and clean the environment, you can use the --shutdown
option (or the deprecated --onlyClean
option). The recommended sequence of actions in such a case includes:
Stopping services – First, stop all active services using the dedicated svc-stop.sh script:
/opt/scripts/svc-stop.sh --force
Waiting period – Then it’s recommended to wait about 120 seconds, which gives time for safe termination of all processes and resource release:
sleep 120
Environment cleanup – After this period, you can run the deploy.sh script with the cleanup option:
/opt/scripts/deploy.sh --shutdown
Redeployment – After the cleanup process is complete, you can redeploy all services by running the script without additional parameters:
/opt/scripts/deploy.sh
This sequence ensures safe and controlled stopping, cleaning, and restarting of the entire environment, minimizing the risk of problems related to improper process termination or remnants of previous configuration.
Detailed Architecture of the deploy.sh Script
1. Help Configuration and Advanced Argument Handling
The deploy.sh script begins by defining the displayHelp()
function, which serves as interactive documentation for the tool, presenting all available options with their descriptions. This function is key for users as it allows them to quickly familiarize themselves with the script’s capabilities without having to analyze its source code.
displayHelp() {
echo "Deploy or re-deploy all Prelude services"
echo ""
echo "Usage:"
echo "./deploy.sh [Options]"
echo ""
echo "Options:"
echo "-h --help Display this message."
echo "--deleteDatabases Delete postgres databases of all services."
echo "--shutdown Shutdown gracefully all services."
echo "--withHttpProxy Enable Http proxy and route all outgoing service traffic to it. Proxy port: 30128. Proxy web console port: 30333."
echo "--onlyClean Deprecated. Use --shutdown instead."
echo "--quick Internal use only. Reduce/eliminates some internal timeouts."
echo "--multiDb Deploy a separate DB server for each service. In cluster deployments DB pods with primary roles are distributed evenly across all nodes."
echo "--legacyEndpointRegistration Create default endpoints using deploy script instead of provisioning service."
echo "--enableAdapterHostSvc Allowed values: true or false. Enable/disable adapter-host-service as an adapters host."
# Additional undocumented options...
}
After defining the help function, the script implements an command-line argument processing mechanism. This mechanism is flexible – it handles both short (single-letter) options and long formats, as well as parameters with values passed after the “=” sign. The following code fragment demonstrates this implementation:
while [ "$1" != "" ]; do
PARAM=echo $1 | awk -F= '{print $1}'
VALUE=echo $1 | awk -F= '{print $2}'
case $PARAM in
-h | --help)
displayHelp
exit
;;
--deleteDatabases)
DELETE_DATABASES=true
;;
--multiDb)
MULTI_DB=true
;;
# ... other options
--enableAdapterHostSvc)
if [[ "$VALUE" == "true" ]]; then
ENABLE_ADAPTER_HOST_SVC=true
else
ENABLE_ADAPTER_HOST_SVC=false
fi
;;
*)
echo "Error: Unknown parameter "$PARAM""
echo ""
displayHelp
exit 1
;;
esac
shift
done
This code fragment uses a while
loop to iterate through all arguments passed to the script. For each argument:
- It extracts the parameter name and its value using
awk -F=
, which allows handling formats like--parameter=value
- Uses a
case
instruction to match the parameter to a known option - Depending on the option, sets the appropriate configuration variable
- For options with values (like
--enableAdapterHostSvc
), analyzes the passed value and sets the variable accordingly - If the parameter doesn’t match any known option, displays an error, help, and exits
Through this mechanism, the script establishes key configuration flags that determine its further operation:
DELETE_DATABASES
– controls whether databases should be deleted and initialized a new, which is useful for migration or troubleshooting corrupted dataENABLE_RESOURCE_LIMITS
– determines whether Kubernetes resource limits (CPU, memory) should be applied, which can affect system performance and stabilitySHUTDOWN
– decides whether services should be stopped, which is used in environment cleanup scenariosMULTI_DB
– configures whether the system should use dedicated databases for each service, which is recommended in production environments for better isolation and scalabilityENABLE_ADAPTER_HOST_SVC
– enables or disables the adapter host service, which is responsible for handling integration adaptersENABLE_EXTENSIBILITY_SUPPORT
– controls support for extensions, allowing platform customization for specific organizational needs
It’s worth noting that the script also handles deprecated options (marked as “Deprecated”), maintaining backward compatibility with earlier versions or existing automation scripts. For example, the --onlyClean
option is deprecated but still supported, and the script directs the user to the newer --shutdown
option.
The argument processing mechanism not only sets internal configuration variables but also implements input validation – it checks the correctness of values for parameters such as --enableAdapterHostSvc
, which expect specific values (true/false). This prevents errors resulting from incorrect input data and ensures configuration consistency.
The completion of the argument processing section establishes a complete configuration profile for the current script invocation, which determines all subsequent operations and decisions made during the deployment process. This allows the administrator to precisely customize the deployment process to specific environment needs and requirements.
2. Advanced Logging System
The logging system implemented in the deploy.sh script is an example of a well-designed, multi-layered mechanism that not only documents the deployment process but also serves as an invaluable diagnostic tool.
The first step in configuring the logging system is establishing unique, time-stamped log files and rotation mechanisms:
log_timestamp=$(date --utc +'%Y-%m-%d-%H-%M-%S')
if [[ -f /var/log/deploy.log ]] && [[ ! -h /var/log/deploy.log ]]; then
mv /var/log/deploy.log /var/log/deploy-old.log
fi
exec > >(tee -a "/var/log/deploy-$log_timestamp.log") 2>&1
ln -sfT "deploy-$log_timestamp.log" /var/log/deploy.log
This code fragment performs several key actions:
Timestamp Generation – The script creates a unique timestamp based on the current UTC time in year-month-day-hour-minute-second format, ensuring that each log file has a unique name, facilitating identification and organization of historical logs.
Handling Existing Logs – The script checks if the
/var/log/deploy.log
file already exists and is not a symbolic link. If so, it moves it todeploy-old.log
, preserving the previous log as a backup. This prevents the loss of important information from previous runs.Output Redirection – Using the advanced construction
exec > >(tee -a "log_file") 2>&1
, the script redirects both standard output (stdout) and the error stream (stderr) to thetee
command, which simultaneously:
- Displays all messages on the console (allowing the administrator to monitor progress)
- Saves the same messages to the log file (the
-a
option adds data to the file instead of overwriting it)
- Symbolic Link Creation – The script creates a symbolic link
/var/log/deploy.log
pointing to the newest log file, providing a constant, predictable access point to current information, regardless of the unique file name.
After configuring the basic logging infrastructure, the script defines the log_stage()
function, which introduces a hierarchical structure to the logs:
log_stage() {
set +x
echo
echo "========================="
echo "[$(date "+%Y-%m-%d %H:%M:%S.%3N%z")] $@"
echo "========================="
echo
set -x
}
This elegant function:
Temporarily Disables Debug Mode – The
set +x
instruction turns off command display, making section headers more readable in logs.Formats a Clear Separator – The function creates a visually distinctive block with horizontal lines, making it easier to browse long logs and quickly locate the beginning of each section.
Adds a Precise Timestamp – Date and time are displayed with millisecond precision and timezone information, which is invaluable for performance analysis and diagnosing time-related issues.
Passes the Message – The function displays the passed message describing the beginning section.
Restores Debug Mode – After printing the header, the
set -x
instruction restores debug mode, in which all executed commands are displayed.
The use of this function throughout the script creates a clear, hierarchical log structure where each main stage of the deployment process is clearly marked. For example:
=========================
[2023-05-15 14:30:27.123 +0000] Creating kubernetes namespaces
=========================
+ k8s_create_namespace ingress
...
=========================
[2023-05-15 14:31:15.456 +0000] Applying ingress certificate
=========================
+ /opt/scripts/prepare_certs.sh
...
This structure significantly facilitates both manual log browsing and automatic analysis using parsing tools.
Additionally, the logging system works with the error handling mechanism, ensuring that in case of deployment failure, a complete diagnostic package is automatically generated:
on_exit() {
if [ $? -ne 0 ]; then
echo "Deployment failed. Collecting log bundle ..."
( cd /root; vracli log-bundle )
fi
# ... other cleanup operations
}
trap on_exit EXIT
The vracli log-bundle
tool invoked in case of error creates a comprehensive package containing not only deploy.sh script logs but also:
- Logs of all Kubernetes system components
- Configuration of Kubernetes resources (pods, services, deployments, secrets)
- Information about database and service status
- Network and connection configuration
- Resource metrics (CPU, memory, disk)
This multi-layered logging system forms the foundation of diagnostic processes and problem-solving in the Aria Automation environment, providing:
- Clear documentation of the deployment process
- Precise tracking of occurring issues
- Historical analysis of previous deployments
- Automatic generation of complete diagnostic packages
- Hierarchical structure facilitating analysis
3. Comprehensive Error Handling and Safety Mechanisms
The deploy.sh script implements a multi-layered, well-thought-out system of error handling and safety mechanisms that ensures the reliability of the deployment process even in the event of unforeseen problems. This system consists of several key components:
The central element of error handling is the on_exit
function, which is called automatically when the script ends, regardless of the reason:
on_exit() {
if [ $? -ne 0 ]; then
echo "Deployment failed. Collecting log bundle ..."
( cd /root; vracli log-bundle )
fi
# Remove temporary helm_upstall check directory
if [ -n "$UPSTALL_STATUS_DIR" ]; then
clear-helm-upstalls-status $UPSTALL_STATUS_DIR true
fi
# Clear the value of property cache.timeout in vracli.conf file
# Do not generate new service status
vracli service status --unset-config service.status.cache.lifetime || true
rm -rf /tmp/deploy.tmp.*
}
trap on_exit EXIT
This function performs several important tasks:
Error Detection – It checks the exit code of the last command (
$?
). If it’s non-zero (indicating an error), it initiates the diagnostic procedure.Automatic Diagnostics – In case of an error, it generates a complete diagnostic package using
vracli log-bundle
. This package contains not only logs but also detailed information about system state, which is invaluable during problem analysis.Temporary Resource Cleanup – Regardless of the outcome, the function ensures the removal of temporary files and directories (
UPSTALL_STATUS_DIR
,/tmp/deploy.tmp.*
), preventing garbage from being left in the system.Cache Configuration Reset – It restores default cache settings for service status, ensuring that the next call to
vracli service status
will generate fresh data.
The trap on_exit EXIT
instruction registers this function as a handler for the EXIT signal, meaning it will be called regardless of whether the script ends normally or prematurely (e.g., due to an error or user interruption).
Additionally, the script defines the die()
function, which provides controlled termination in case a critical error is detected:
die() {
local msg=$1
local exit_code=$2
if [ $# -lt 2 ]; then
exit_code=1
fi
set +x
clear || true
echo $msg
exit $exit_code
}
This function:
Accepts an Error Message – The first argument is a human-readable problem description to be displayed.
Accepts a Custom Exit Code – The second, optional argument allows specifying an error code (default is 1), which can be used by automation systems to differentiate error types.
Disables Debug Mode – The
set +x
instruction ensures that the error message will be clearly visible, without mixing with debugging output.Clears the Screen – The
clear || true
command clears the screen (if possible), increasing the visibility of the error message.Displays the Message and Exits – It prints the error message and ends the script with the specified exit code.
The die()
function is strategically used at key points in the script where error conditions can be detected. For example:
if ! vracli status first-boot -w 300; then
die "Timeout expired"
fi
In this case, if the Kubernetes cluster doesn’t reach the “first-boot” state within 300 seconds, the script will be safely interrupted with a readable “Timeout expired” message.
In addition to these main mechanisms, the script uses a number of advanced error handling techniques:
- Timeout Control – For long-running operations, the script applies the
timeout
command, which automatically interrupts the operation if it exceeds a specified time:
timeout 300s bash -c wait_deploy_health
- Retry Mechanisms – For operations that may temporarily fail (e.g., due to network delays), the script uses the
retry_backoff
function from theretry_utils.sh
module:
retry_backoff "5 15 45" "Failed to load existing vRO config" "load_existing_config"
This function tries to perform the operation, and in case of failure, waits a specified time (5, 15, 45 seconds) before subsequent attempts.
- Error Handling in Parallel Processes – For operations performed in parallel (in the background), the script implements a status checking mechanism:
check-helm-upstalls-status() {
# ... code checking status
if [ "${failure_count}" -gt "0" ]; then
log_stage "There are failed install/upgrade of helm releases"
return 1
fi
}
- Selective Error Ignoring – In some cases, the script deliberately ignores specific errors when they are not critical:
vracli ntp show-config || true
The || true
operator means that failure of the vracli ntp show-config
command will not cause the script to terminate.
- Dynamic Adaptation to Error Conditions – In some scenarios, the script takes specific remedial actions instead of simply terminating:
if [[ "$retry_count" -gt 5 ]]; then
log_stage "Too many retries. Attempting recovery procedure..."
# ... recovery code
fi
All of these mechanisms create a layered, resilient system that provides:
- Reliable detection and reporting of errors
- Automatic diagnostics and problem information collection
- Controlled termination in case of critical errors
- Intelligent retry of operations in case of temporary problems
- Proper resource cleanup, even in case of failure
- Support for parallel operation execution with safety preserved
Such comprehensive error handling is key to the reliable deployment of complex systems like VMware Aria Automation, where the installation process includes many interdependent components and can be susceptible to various problems – from temporary network failures to resource issues to configuration conflicts.
4. Multi-layered Environment State Checking
One of the key aspects of the deploy.sh script is the implementation of advanced environment state checking mechanisms that ensure all components are properly prepared before starting significant deployment operations. This multi-layered system of environment state verification is essential for ensuring stability and predictability of the deployment process.
The central element of this system is the wait_deploy_health
function, which performs cyclical health checks until a proper state is achieved:
# Run a health check with the deploy profile
wait_deploy_health() {
while true; do
/opt/health/run-once.sh deploy && break || sleep 5
done
}
export -f wait_deploy_health
log_stage "Waiting for deploy healthcheck"
timeout 300s bash -c wait_deploy_health
This code fragment performs the following tasks:
- Cyclical Check Function Definition – The
wait_deploy_health
function implements an infinite loop that:
- Calls the
/opt/health/run-once.sh
script with the “deploy” profile - If the script ends successfully (exit code 0), breaks the loop (
break
) - Otherwise, waits 5 seconds before the next attempt
Function Export to Subprocesses – The
export -f wait_deploy_health
instruction allows using this function in subprocesses, which is necessary for operation with thetimeout
commandCheck Start Logging – The
log_stage
function documents the beginning of the health check waiting processTime Limit – The
timeout 300s
command establishes a 5-minute time limit, after which, if the health check still fails, the operation will be interrupted
The /opt/health/run-once.sh
script used in this process is a complex diagnostic tool that performs a series of specialized tests:
#!/bin/bash -l
PATH=$PATH:/sbin:/usr/sbin
if [ -z "${1}" ]; then
echo "Health check profile required"
exit 1
fi
rundir=$( mktemp -d )
cd "${rundir}"
# Run the requested health checks concurrently
/usr/bin/make --file=/opt/health/Makefile --jobs --keep-going --output-sync=target "${1}"
err=$?
cd ..
rm -rf "${rundir}"
exit "${err}"
This script:
- Requires a health check profile to be provided as an argument
- Creates a temporary working directory
- Uses the Make system (with a Makefile) to run multiple health tests in parallel
- The
--jobs
option allows parallel test execution, speeding up the process - The
--keep-going
flag causes even if some tests fail, the others will still be executed - The
--output-sync=target
parameter ensures that output from parallel processes will not mix
In the /opt/health/Makefile
file, various test profiles are defined, including the “deploy” profile, which checks key aspects of the environment:
- Kubernetes API availability
- Basic cluster component status
- Network configuration
- System resource availability
- Infrastructure service status
After the basic health check is complete, the script continues verification by checking Kubernetes cluster readiness:
# Wait for K8s to be ready before proceed
# Approximately 5 minutes of timeout before failing
if ! vracli status first-boot -w 300; then
die "Timeout expired"
fi
This command:
- Calls
vracli status first-boot
with the-w 300
parameter, meaning wait up to 300 seconds (5 minutes) for cluster readiness - If after this time the cluster is still not ready, the script calls the
die
function with the “Timeout expired” message, which causes controlled termination of the deployment process
Environment state verification is not limited only to the initial stages – the script contains checkpoints distributed throughout the entire deployment process. For example, before deploying infrastructure services:
log_stage "Deploying infrastructure services"
# ... environment preparation ...
# Check if Kubernetes API is available
kubectl get nodes &> /dev/null || {
echo "Kubernetes API is not responding"
exit 1
}
# Verify etcd availability
vracli cluster etcd health || {
echo "etcd is not healthy"
exit 1
}
# ... continue deployment ...
After deploying key components, the script again verifies system state:
log_stage "Verifying core services"
# Wait for core services readiness
timeout 300s bash -c 'until kubectl -n prelude get pods | grep identity-service | grep -q Running; do sleep 5; done'
timeout 300s bash -c 'until kubectl -n prelude get pods | grep rabbitmq-ha | grep -q Running; do sleep 5; done'
# Check service status
vracli service status | grep -E "identity-service|rabbitmq-ha" | grep -qv Running && {
echo "Core services are not running"
exit 1
}
Such multi-layered verification ensures that:
- The base environment (Kubernetes, network, resources) is properly configured
- Basic infrastructure components are available and working correctly
- Key services have reached the “Running” state before continuing deployment
- The process will not continue if problems are detected, preventing inconsistent states
Additionally, the script implements verification mechanisms specific to individual components, for example for databases:
log_stage "Verifying database health"
# Check primary node availability for each database
for db in ${databases[@]}; do
if ! vracli db status --dbname "$db" | grep -q "Primary node: Available"; then
echo "Database $db primary node is not available"
exit 1
fi
done
# Verify replicas (in multi-DB mode)
if [[ "$MULTI_DB" == "true" ]]; then
for db in ${databases[@]}; do
if ! vracli db status --dbname "$db" | grep -q "Replicas: 2/2"; then
echo "Database $db replicas are not fully available"
exit 1
fi
done
fi
This comprehensive environment state verification system forms the foundation of a stable deployment process, ensuring that:
- Each stage begins in a known, predictable state
- Problems are detected as early as possible, before they cause cascading failures
- The deployment process is deterministic and repeatable
- The administrator receives clear messages about any problems
- The environment is not left in an inconsistent state in case of failure
Thanks to these mechanisms, the deploy.sh script can reliably deploy the complex VMware Aria Automation environment, even in variable environmental conditions or unstable base infrastructure.
5. Advanced Database Configuration and Intelligent Backup Creation
Database management is one of the most advanced aspects of the deploy.sh script. This section implements complex mechanisms for handling various deployment scenarios, data migration, and ensuring high availability. The script uses the db_utils.sh
module, which contains specialized functions for managing PostgreSQL databases in a container environment:
source /opt/scripts/db_utils.sh
log_stage "Backing up databases from existing pods"
# The backup_db_before_destroy function performs database backup before destroying existing data
backup_db_before_destroy "$MULTI_DB" "$DELETE_DATABASES" "$SHUTDOWN" "$NAMESPACE_PRELUDE"
The backup_db_before_destroy
function implements complex decision logic to determine whether a data backup is required and in what mode:
backup_db_before_destroy()
{
local multi_db="$1"
local delete_databases="$2"
local shutdown="$3"
local namespace="$4"
local multi_db_previous=""
# If we're doing shutdown or deleting databases, migration is not required
if [ "$shutdown" == "true" -o "$delete_databases" == "true" ]
then
export MULTI_DB_MIGRATE=false
return 0
fi
# Detecting previous database configuration
local database_directories=(/data/db/p-*)
if [[ -d "/data/db/live" ]]
then
multi_db_previous=false
elif [[ -d "${database_directories[0]}/live" ]]
then
multi_db_previous=true
else
export MULTI_DB_MIGRATE=false
return 0
fi
# Check if there's been a change in mode (multi_db)
if [ "$multi_db_previous" != "$multi_db" ]
then
export MULTI_DB_MIGRATE=true
else
export MULTI_DB_MIGRATE=false
return 0
fi
# Prepare backup directory and perform data dump
vracli cluster exec -- bash -c 'mkdir -p /data/db/migrate'
export MULTI_DB_BACKUP=$(mktemp -d --dry-run /data/db/migrate/XXX)
vracli cluster exec -- bash -c "mkdir -p ${MULTI_DB_BACKUP}"
dump_all_databases "$namespace" "$MULTI_DB_BACKUP"
}
This code fragment contains advanced logic:
Parameter Analysis – The function analyzes passed parameters (multi_db, delete_databases, shutdown) to determine the action strategy.
Automatic Database Topology Detection – By checking directory structures on disk (
/data/db/live
for single-DB or/data/db/p-*/live
for multi-DB), the function automatically determines whether the previous deployment used single-DB or multi-DB configuration.Mode Change Detection – Comparing the detected configuration with the requested one (the
multi_db
parameter) allows determining if there has been a mode change that requires data migration.Migration Preparation – If a mode change is detected, the function:
- Creates a directory for migration (
/data/db/migrate
) - Generates a unique name for the backup directory
- Performs a dump of all databases
The dump_all_databases
function is responsible for creating backups of all databases:
function dump_all_databases()
{
namespace="$1"
backup_dir="$2"
for database in $(kubectl get configmap db-settings -n ${namespace} -o json | jq -r ".data| keys[]"| grep -v "postgres" | grep -v "repmgr-db")
do
dump_database "$database" "$backup_dir"
done
}
function dump_database()
{
local database="$1"
local backup_dir="$2"
vracli cluster exec -- bash -c "vracli db dump ${database} > ${backup_dir}/${database}.sql || rm ${backup_dir}/${database}.sql"
}
This function:
- Gets a list of databases from the
db-settings
ConfigMap, skipping thepostgres
andrepmgr-db
databases, which are system databases - For each database calls the
dump_database
function, which:
- Performs a database dump to an SQL file
- In case of error, removes the file to prevent trying to restore a corrupted backup
After creating backups and potentially deleting existing databases, the script initiates the database deployment process, using the upstall_postgres
function:
function upstall_postgres()
{
local multi_db="$1"
local migrate="$2"
local backup_dir="$3"
local values="$4"
local namespace="$5"
local upstall_status_dir="$6"
# Database deployment
deploy_databases "$multi_db" "$values" "$namespace" "$upstall_status_dir"
# Data migration, if required
if [[ "$migrate" == "true" ]]
then
migrate_stored_data "$namespace" "$backup_dir"
fi
# Cleanup, depending on mode
if [[ "$multi_db" == "true" ]]
then
helm-upstall postgres-measurer "" "$namespace"
vracli cluster exec -- bash -c "rm -rf /data/db/live; rm -rf /data/db/backup; rm -rf /data/db/flags"
else
vracli cluster exec -- bash -c "rm -rf /data/db/p-*"
fi
}
The deploy_databases
function is responsible for parallel database deployment, which significantly speeds up the process:
function deploy_databases()
{
local multi_db="$1"
local values="$2"
local namespace="$3"
local upstall_status_dir="$4"
local databases=$(kubectl get configmap db-settings -n ${namespace} -o json | jq -r ".data| keys[]"| grep -v "postgres" | grep -v "repmgr")
if [[ "$multi_db" == false ]]
then
databases=("postgres")
fi
for database in ${databases[@]}
do
deploy_database "$database" "$values" "$namespace" "$upstall_status_dir"
done
wait
}
This function:
- In single-DB mode, uses only one database (
postgres
) - In multi-DB mode, gets a list of all required databases from the ConfigMap
- For each database runs the
deploy_database
function in the background (asynchronously) - At the end calls
wait
to wait for all parallel processes to complete
The deploy_database
function configures and deploys a single database:
function deploy_database()
{
local database="$1"
local values="$2"
local namespace="$3"
local upstall_status_dir="$4"
local data_directory_path="/data/db"
local release_name="$database"
if [[ "$database" != "postgres" ]]
then
release_name=$(echo "$release_name" | sed "s/-db//;s/-//g")
release_name="p-${release_name}"
data_directory_path="${data_directory_path}/${release_name}"
values="${values},multiDB=true"
else
values="${values},multiDB=false"
fi
vracli cluster exec -- bash -c "rm -f ${data_directory_path}/live/pg_stat/repmgrd_state.txt"
helm-upstall postgres "${values},releaseName=${release_name},dbName=${database}" "${namespace}" '' '' 7200 CHECK_DIR=${upstall_status_dir} &
}
This function performs the following operations:
- Determines the Helm release name and data directory path, depending on whether it’s the main database (
postgres
) or a dedicated service database - Clears the repmgrd state file, ensuring proper initialization of the replication cluster
- Calls
helm-upstall
with appropriate parameters, including:
- Release name (modified if it’s a dedicated database)
- Database name
- Long timeout (7200 seconds) to ensure sufficient time for initialization
- Status directory for progress monitoring
In case of data migration (change from single-DB to multi-DB mode or vice versa), the migrate_stored_data
function restores the saved data:
function migrate_stored_data()
{
local namespace="$1"
local backup_dir="$2"
local databases=$(kubectl get configmap db-settings -n ${namespace} -o json | jq -r ".data| keys[]"| grep -v "postgres" | grep -v "repmgr-db")
for database in ${databases[@]}
do
local backup_file="${backup_dir}/${database}.sql"
if [[ -s ${backup_file} ]]
then
vracli cluster exec -- bash -c "vracli db restore --dbname ${database} ${backup_file} &> /dev/null"
else
exit 1
fi
done
vracli cluster exec -- bash -c "rm -rf ${backup_dir}"
}
This function:
- Gets a list of databases (again skipping the system databases
postgres
andrepmgr-db
) - For each database checks if the backup file exists and is not empty
- If so, restores the database from the backup using
vracli db restore
- After completion, removes the backup directory
Additionally, for high availability environments, the script contains functions for monitoring and balancing primary nodes in the cluster:
function get_primaries()
{
database_pods=$(kubectl get pods -n prelude -o custom-columns=:metadata.name,:spec.containers[0].image | grep db-image | cut -d " " -f1 | grep "0")
for pod in ${database_pods[@]}
do
pod_data=$(kubectl exec -n prelude ${pod} -- bash -c "chpst -u postgres repmgr node check --upstream 2>/dev/null")
if [[ "$pod_data" =~ "primary" ]]
then
primary="$pod"
else
primary=$(echo "$pod_data" | sed -r "s/.*upstream.*"([^.]*)..*/1/")
fi
echo "${primary}"
done
}
function draw_table()
{
local pods_in_0=()
local pods_in_1=()
local pods_in_2=()
for pod in $(get_primaries)
do
local node_id=$(echo $pod | grep -Eo "[0-9]+")
if [[ $node_id == 0 ]]
then
pods_in_0+=($pod)
elif [[ $node_id == 1 ]]
then
pods_in_1+=($pod)
else
pods_in_2+=($pod)
fi
done
output=""
for i in {0..30}
do
if [ -n "${pods_in_0[$i]}" -o -n "${pods_in_1[$i]}" -o -n "${pods_in_2[$i]}" ]
then
output="$output${pods_in_0[$i]}, ${pods_in_1[$i]}, ${pods_in_2[$i]}n"
else
break
fi
done
echo -ne $output | column -t -N Node0,Node1,Node2 -o "|" -s ','
}
These functions:
- Identify the primary node for each database
- Group them by cluster node (Node0, Node1, Node2)
- Generate a clear table showing the distribution of primary nodes, which is key to understanding high availability topology
This entire advanced database management system ensures:
- Flexibility in configuration (single-DB vs multi-DB)
- Automatic data migration when changing modes
- Intelligent backup creation before potentially destructive operations
- Parallel database deployment to speed up the process
- Support for high availability clusters with replication
- Clear visualization of database topology
Thanks to these mechanisms, the deploy.sh script can reliably manage databases in various configurations, ensuring both data security and optimal resource utilization.
6. Controlled Stopping and Removal of Existing Deployment
The deploy.sh script implements a thoughtful, multi-stage process for stopping and removing existing deployment, ensuring safe and controlled environment cleanup before reinstallation. This phase is crucial to ensure that new deployment starts in a clean, predictable state without remnants of previous configuration.
This process is initiated in the section marked as “Tear down existing deployment”:
log_stage "Tear down existing deployment"
# Graceful service stopping using the svc-stop.sh script
timeout 300s /opt/scripts/svc-stop.sh --force 2> /dev/null || true
# If the QUICK option was not selected, wait an additional 120 seconds
if [ "$QUICK" = false ] ; then
sleep 120
fi
This initial code fragment performs the following tasks:
Graceful Service Stopping – The script calls
/opt/scripts/svc-stop.sh --force
, which methodically stops all services in a controlled manner. The--force
parameter ensures that the operation will continue even if there are problems with some services.Timeout for Long Operations – The
timeout 300s
command establishes a maximum time of 5 minutes to complete the stopping operation, which prevents the script from hanging in case of problems.Error Ignoring – The
|| true
operator ensures that the script will continue even if the service stopping returns an error, which is important for operation idempotence (e.g., when services are already stopped).Stabilization Period – If the quick deployment option (
QUICK
) wasn’t selected, the script waits an additional 120 seconds, giving time for processes to fully terminate, resources to be released, and the system to stabilize.
The svc-stop.sh
script performs many tasks related to safely stopping services:
# Fragment from svc-stop.sh
wait_deploy_health() {
while true; do
echo Health check iteration
/opt/health/run-once.sh deploy && break || sleep 5
done
}
export -f wait_deploy_health
if [[ "$@" != *"--force"* ]]; then
timeout 300s bash -c wait_deploy_health
fi
helm ls -n prelude --short | grep -o -Fx -f /etc/vmware-prelude/services.list | xargs -r -t -n 1 -P 0 helm uninstall -n prelude --timeout=1200s
This script:
- Ensures the environment is in a stable state (health check)
- Identifies installed Helm releases from the service list
- Calls
helm uninstall
for each service, with a long timeout (20 minutes) for safe stopping
After stopping services, deploy.sh proceeds to remove Kubernetes namespaces:
# Removing Kubernetes namespaces
k8s_delete_namespace "${NAMESPACE_INGRESS}" 300
k8s_delete_namespace "${NAMESPACE_PRELUDE}" 600
# Removing clusterrolebinding for prelude
timeout 300s kubectl delete clusterrolebinding "${NAMESPACE_PRELUDE}"-view 2> /dev/null || true
The k8s_delete_namespace
function implements advanced logic for removing namespaces with retry mechanisms and handling “stuck” pods:
function k8s_delete_namespace() {
local ns="$1"
[[ -z "$2" ]] && local timeout=300 || local timeout="$2"
# Delete namespace with time limit
timeout "$timeout"s kubectl delete namespace "${ns}" 2> /dev/null || true
# Wait until namespace completely disappears (check every 5 seconds)
local retry_interval=5
local retries_count=$((timeout / retry_interval))
until [ $retries_count -eq 0 ]; do
((retries_count-=1))
local found=$(kubectl get namespaces --no-headers | cut -f 1 -d ' ' | grep -x "$ns" | wc -l)
[[ -z "$found" ]] && local found=0
if [ $found -eq 0 ]; then
return 0
else
sleep $retry_interval
fi
vracli cluster exec -- /opt/scripts/kill_stale_pods.sh "$1" || true
done
return 1
}
This complex function:
Initiates Namespace Deletion – Calls
kubectl delete namespace
with a specified timeout, which starts the cleanup process.Monitors the Deletion Process – In a loop, checks if the namespace still exists, using the
kubectl get namespaces
command with filtering.Eliminates “Stuck” Pods – During waiting, calls the
kill_stale_pods.sh
script, which identifies and terminates pods that may be blocking namespace deletion (e.g., due to stuck finalizers).Handles Timeout – If after a specified time (default 300 or 600 seconds) the namespace still exists, the function returns an error, which may indicate problems with resources blocking deletion.
The kill_stale_pods.sh
script implements low-level mechanisms to identify and terminate problematic pods:
#!/bin/bash
node=$(current_node)
for pod in $(kubectl get pods -n "$1" --field-selector="spec.nodeName=$node" -o jsonpath='{.items[*].metadata.uid}'); do
for id in $(docker ps -aq --no-trunc --filter="label=io.kubernetes.pod.uid=$pod" ); do
pkill -9 -ef "^containerd-shim .*moby/$id"
done
done
This script:
- Identifies pods in a specified namespace assigned to the current node
- For each pod finds its corresponding Docker containers
- Terminates containerd-shim processes associated with these containers
If the DELETE_DATABASES
option is active, the deploy.sh script additionally cleans persistent data from disk:
if [ "$DELETE_DATABASES" = true ] ; then
log_stage "Deleting persisted data"
vracli cluster exec -- bash -c 'find /data/db -maxdepth 2 -type d -name live -printf "%Pn" | xargs -I {} rm -rf /data/db/{}'
vracli cluster exec -- bash -c 'rm -rf /data/openldap; mkdir -p /data/openldap'
vracli reset rabbitmq --confirm || true
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/clientsecrets", "value": ""}]' || true
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/crdssecret", "value": ""}]' || true
fi
This code fragment performs comprehensive data cleaning:
Removing PostgreSQL Data – Identifies and removes database directories (
/data/db/.../live
).Cleaning LDAP – Removes the
/data/openldap
directory and creates a new, empty one.RabbitMQ Reset – Calls
vracli reset rabbitmq --confirm
, which cleans the message broker configuration.Secret Cleaning – Updates the vaconfig configuration object, removing saved client secrets and database secrets.
After completing the cleanup process, the script is ready to begin a new deployment in a fresh, clean environment.
The entire process of stopping and removing existing deployment is designed with:
- Safe and controlled service stopping
- Methodical removal of Kubernetes resources
- Intelligent handling of “stuck” pod problems
- Optional persistent data removal
- Resilience to errors and unpredictable states
Thanks to these mechanisms, the deploy.sh script ensures that new deployment begins in a clean, predictable state, which is key to reliable installation or update of VMware Aria Automation.
7. Infrastructure Initialization: Namespace and SSL Certificate Management
After the cleanup phase, the deploy.sh script proceeds to initialize basic infrastructure, creating necessary Kubernetes namespaces and configuring the SSL certificate management system. This phase is fundamental to the entire deployment process as it establishes a secure foundation on which subsequent components will be built.
log_stage "Creating kubernetes namespaces"
# Create the ingress namespace, if necessary
k8s_create_namespace "${NAMESPACE_INGRESS}"
# Create the prelude namespace, if necessary
k8s_create_namespace "${NAMESPACE_PRELUDE}"
The k8s_create_namespace
function implements idempotent Kubernetes namespace creation:
function k8s_create_namespace() {
local ns="$1"
if [[ $(kubectl get namespaces --no-headers | cut -f 1 -d ' ' | grep -x "$ns" | wc -l) == 0 ]]; then
kubectl create namespace "$ns"
fi
}
This function:
- Checks if a namespace with the given name already exists
- If not, creates it using
kubectl create namespace
- Thanks to the checking condition, the function is idempotent – it can be safely called multiple times without risk of errors
After creating namespaces, the script proceeds to configure SSL certificates:
log_stage "Applying ingress certificate"
/opt/scripts/prepare_certs.sh
/opt/scripts/apply_certs.sh
The prepare_certs.sh
script handles generating or retrieving SSL certificates:
#!/bin/bash
set -e
# Generate self-signed certificate for ingress if such does not exist
vracli certificate ingress --list &>/dev/null || vracli certificate ingress --generate auto --set stdin
This simple but effective script:
- Checks if an ingress certificate already exists (using
vracli certificate ingress --list
) - If it doesn’t exist, generates a new, self-signed certificate using
vracli certificate ingress --generate auto
- The
--set stdin
option allows interactively specifying certificate parameters (though in this case default values are used)
After preparing certificates, the apply_certs.sh
script installs them in the Kubernetes cluster:
#!/bin/bash
CERT_INGRESS_PEM=$(mktemp --suffix=ingress.pem)
CERT_INGRESS_KEY=$(mktemp --suffix=ingress.key)
CERT_FQDN_PEM=$(mktemp --suffix=fqdn.pem)
CERT_PROXY_PEM=$(mktemp --suffix=proxy.pem)
# Remove existing secrets if they exist
if [[ $(kubectl get secrets -n ingress | grep cert-ingress | wc -c) -gt 0 ]]; then
kubectl delete secret -n ingress cert-ingress
fi
if [[ $(kubectl get secrets -n prelude | grep cert-ext | wc -c) -gt 0 ]]; then
kubectl delete secret -n prelude cert-ext
fi
# Get ingress certificate and key
vracli certificate ingress --list-key > $CERT_INGRESS_KEY
vracli certificate ingress --list > $CERT_INGRESS_PEM
# Create TLS secret for ingress
kubectl create secret tls cert-ingress
--cert=${CERT_INGRESS_PEM}
--key=${CERT_INGRESS_KEY}
-n ingress || exit $?
rm -f ${CERT_INGRESS_KEY}
# Handle load-balancer and proxy certificates
vracli certificate load-balancer --list
if [[ $? = 0 ]]
then
vracli certificate load-balancer --list > $CERT_FQDN_PEM
else
mv $CERT_INGRESS_PEM $CERT_FQDN_PEM
fi
kubectl_cmd="kubectl -n prelude create secret generic cert-ext --from-file=fqdn.pem=${CERT_FQDN_PEM} "
vracli certificate proxy --list > $CERT_PROXY_PEM && kubectl_cmd="$kubectl_cmd --from-file=https_proxy.pem=${CERT_PROXY_PEM}"
echo $kubectl_cmd
$kubectl_cmd
rm -f $CERT_INGRESS_PEM $CERT_FQDN_PEM $CERT_PROXY_PEM || true
This more complex script:
Creates Temporary Files – Uses
mktemp
to create temporary files with appropriate suffixes.Removes Existing Secrets – Checks if the
cert-ingress
andcert-ext
secrets already exist and, if so, removes them to avoid conflicts.Gets Certificates and Keys – Uses
vracli certificate
to retrieve the ingress certificate and its private key.Creates TLS Secret – Uses
kubectl create secret tls
to create a secret in theingress
namespace, which will be used by the ingress controller for TLS termination.Secures Private Key – Immediately removes the temporary file containing the private key, minimizing the risk of its exposure.
Handles Load-Balancer Certificates – Tries to retrieve the load-balancer certificate, and if it doesn’t exist, uses the ingress certificate as a substitute.
Dynamically Builds Command – Constructs a
kubectl
command dynamically, depending on proxy certificate availability.Creates Certificate Secret – Executes the built command, creating the
cert-ext
secret in theprelude
namespace, containing FQDN certificates and optionally proxy.Cleans Temporary Files – Removes all temporary files, regardless of operation outcome.
After configuring certificates, the script calls apply_profiles.sh
, which applies configuration profiles:
/opt/scripts/apply_profiles.sh
The apply_profiles.sh
script is responsible for activating and configuring system profiles that can modify standard platform behavior:
#!/bin/bash
set -uo pipefail
shopt -s nullglob
export PRELUDE_PROFILE_ROOT=/etc/vmware-prelude/profiles
# Check parameters
if [[ "$#" != 0 ]]; then
echo 'This command takes no arguments' >&2
exit 1
fi
# Iterate through all profiles
for profile in "$PRELUDE_PROFILE_ROOT"/*; do
export PRELUDE_PROFILE_PATH="$profile"
profile_name="${profile##*/}"
# Check profile structure correctness
if [[ ! -x "$profile"/check ]]; then
echo "Profile $profile_name malformed: check not executable" >&2
fi
# Execute check script to verify if profile should be active
if "$profile"/check; then
echo "Profile $profile_name: enabled" >&2
# Apply Helm overrides if they exist
if [[ -e "$profile/helm" ]]; then
/opt/scripts/apply-override-dir "$profile/helm" "$profile_name.profile.prelude.vmware.com" || {
echo "Profile $profile_name: failed to apply helm overrides" >&2
exit 1
}
fi
# Execute on-active script if it exists
if [[ -e "$profile/on-active" ]] || [[ -h "$profile/on-active" ]]; then
"$profile/on-active" || {
err="$?"
echo "Profile $profile_name: on-active failed with status $err" >&2
exit 1
}
fi
else
# Exit code 1 means "normally inactive". Any other code is an error.
err="$?"
if [[ "$err" != 1 ]]; then
echo "Profile $profile_name: check failed with status $err" >&2
exit 1
fi
echo "Profile $profile_name: disabled" >&2
# Execute on-inactive script if it exists
if [[ -e "$profile/on-inactive" ]] || [[ -h "$profile/on-inactive" ]]; then
"$profile/on-inactive" || {
err="$?"
echo "Profile $profile_name: on-inactive failed with status $err" >&2
exit 1
}
fi
fi
done
This advanced script performs the following tasks:
Environment Configuration – Sets shell options and defines the profile directory.
Profile Iteration – Searches the
/etc/vmware-prelude/profiles
directory and processes each found profile.Structure Checking – Verifies if the profile contains an executable
check
script.Activation State Determination – Calls the profile’s
check
script to determine if it should be active.
- Exit code 0 means the profile should be active
- Code 1 means the profile should be inactive
- Any other code is treated as an error
- Active Profile Configuration Application – For active profiles:
- If a
helm
directory exists, calls theapply-override-dir
script to apply Helm configuration overrides - If an
on-active
script exists, executes it
- Inactive Profile Handling – For inactive profiles:
- If an
on-inactive
script exists, executes it
A system profile is a directory containing:
- A
check
script – determining if the profile should be active or not - A
helm
directory – containing files that override Helm configuration values - An
on-active
script – executed when the profile is active - An
on-inactive
script – executed when the profile is inactive
This plugin architecture allows extending the functionality and customizing the behavior of the deploy.sh script without modifying its source code, which is key to maintainability and extensibility.
It’s worth noting the apply-override-dir
script, which is used to apply Helm configuration overrides from profiles:
#!/bin/bash
set -ueo pipefail
progname="$0"
die() {
echo "$progname: $1" >&2
exit 1
}
# ... argument processing ...
for f in "$1"/*.yaml; do
[[ -f "$f" ]] || continue
chart_name="${f##*/}"
chart_name="${chart_name%.yaml}"
echo "Applying $chart_name override from $1"
/opt/scripts/apply-override -n "$namespace" -p "$priority" -s "$chart_name" "$name" < "$f"
done
This script:
- Iterates through YAML files in the profile directory
- For each file extracts the chart name (from filename)
- Calls the
apply-override
script to apply overrides for the specific chart
This entire infrastructure initialization phase:
- Creates necessary Kubernetes namespaces
- Configures SSL certificates for secure communication
- Applies system profiles, customizing platform behavior
- Establishes a secure and flexible base for further component deployment
Thanks to these mechanisms, the deploy.sh script ensures that the basic infrastructure is properly configured and ready to accept application components, which is a key step in the VMware Aria Automation deployment process.
8. Network Connection Configuration: etcd, proxy, and NTP
After initializing the basic Kubernetes infrastructure, the deploy.sh script proceeds to configure key network services that are essential for the proper functioning of the entire ecosystem. This phase includes configuring the etcd data storage system, HTTP proxy settings, and time synchronization (NTP).
log_stage "Updating etcd configuration to include https_proxy if such exists"
vracli proxy show || {
vracli proxy set-default
vracli proxy show
}
vracli proxy update-etcd
#
# Show and apply NTP configuration if such exists.
#
vracli ntp show-config || true
vracli ntp status || true
This code fragment performs several important tasks:
1. HTTP Proxy System Configuration
The first step is to ensure that the HTTP proxy is properly configured:
vracli proxy show || {
vracli proxy set-default
vracli proxy show
}
This code block:
- Checks Current Proxy Configuration – Calls
vracli proxy show
, which displays current HTTP proxy settings. - Sets Default Configuration if Needed – If the command returns an error (proxy is not configured), executes the code block in curly braces:
vracli proxy set-default
– sets default proxy configurationvracli proxy show
– displays the newly set configuration
The vracli proxy
tool is an advanced component that manages HTTP proxy configuration for the entire environment. It can:
- Set URL addresses for HTTP and HTTPS proxies
- Configure exception lists (hosts and domains that should not use proxy)
- Set credentials if the proxy requires authentication
- Distribute proxy configuration to all system components
2. etcd Configuration Update
After ensuring that the proxy is properly configured, the script updates the configuration in etcd:
vracli proxy update-etcd
Etcd is a distributed key-value database that plays a critical role in the Kubernetes and VMware Aria Automation ecosystem:
- It stores Kubernetes cluster configuration
- Contains data about component states
- Stores configuration settings for various services
The vracli proxy update-etcd
command propagates proxy settings to etcd, which means:
- All components can access current proxy configuration
- New pods will automatically receive proper settings
- Configuration is stored in a central location, making it easier to manage
This operation is particularly important in corporate environments where access to external resources is controlled by HTTP proxies and where improper proxy configuration can lead to connectivity issues.
3. NTP Verification and Configuration
The last element of this phase is verification and potential configuration of time synchronization:
vracli ntp show-config || true
vracli ntp status || true
These commands:
- Display Current NTP Configuration –
vracli ntp show-config
shows which NTP servers are configured. - Check Synchronization Status –
vracli ntp status
verifies if the system clock is properly synchronized.
The || true
operator after both commands ensures that the script will continue even if these commands return an error (e.g., when NTP is not configured).
Proper time synchronization is critically important for a distributed environment for several reasons:
- It enables consistent logging and monitoring
- It’s essential for cryptographic protocols and authorization mechanisms
- It affects the proper functioning of transaction mechanisms in databases
- It ensures correct functioning of cache and data expiration mechanisms
Although the script does not explicitly configure NTP, it displays the current state, allowing the administrator to verify if time synchronization is correct. If needed, the administrator can use the vracli ntp set
command to configure NTP servers.
It’s worth noting that vracli
is an advanced platform management tool for VMware Aria Automation that encapsulates many configuration operations, simplifying the management process. For network configurations like proxy and NTP, this tool provides:
- A consistent interface for various configuration operations
- Input data validation
- Setting propagation to all components
- Configuration correctness verification
This entire network connection configuration phase ensures that the VMware Aria Automation environment is properly configured in terms of:
- Access to external resources (via proxy)
- Configuration storage and distribution (via etcd)
- Time synchronization (via NTP)
These elements are fundamental to the proper functioning of the entire ecosystem and ensure that all components can reliably communicate both internally and with external resources.
9. Credential Management and Service Deployment Using Helm
After configuring the basic infrastructure and network services, the deploy.sh script proceeds to one of the most critical stages – generating credentials, managing secrets, and deploying service components using Helm. This phase is key to ensuring platform security and its proper functioning.
cd /opt/charts
log_stage "Deploying infrastructure services"
set +x
# Prepare credentials and database configuration
source /opt/scripts/persistence_utils.sh
credentials_load
/opt/scripts/generate_credentials.sh
credentials_save
# Load database settings
helm-upstall db-settings "" "${NAMESPACE_PRELUDE}"
The first stage of this phase is preparing and managing credentials:
Change Working Directory – The script changes to the
/opt/charts
directory, where Helm chart definitions for deployed services are located.Disable Command Display – The
set +x
instruction turns off command display, which is crucial for security since subsequent operations involve sensitive data (passwords, keys).Load Persistent Data Management Tools – The script sources
/opt/scripts/persistence_utils.sh
, which contains functions for managing secrets and configuration.Load Existing Credentials – The
credentials_load
function restores previously saved credentials from the configuration object (CRD) or initializes new ones:
credentials_load() {
tmpfile=$(mktemp)
kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.crdssecret' | base64 -d > $tmpfile
if [ ! -s $tmpfile ]; then
rm -f $tmpfile
kubectl -n prelude create secret generic db-credentials --from-literal=postgres=change_me
else
kubectl apply -f $tmpfile
rm -f $tmpfile
fi
}
This function:
- Creates a temporary file
- Gets the encoded
crdssecret
value from the configuration object - Decodes it from base64 and writes to the file
- If the file is empty (no saved credentials), creates an empty secret with default values
- Otherwise applies the saved secret using
kubectl apply
- Generate New Credentials – The script calls
/opt/scripts/generate_credentials.sh
, which creates a comprehensive set of credentials for various platform components:
#!/bin/bash
set -e
source /opt/scripts/persistence_utils.sh
# Generate passwords for databases
credential_add_from_command "postgres" /opt/scripts/generate_pass.sh
credential_add_from_command "repmgr-db" /opt/scripts/generate_pass.sh
credential_add_from_command "abx-db" /opt/scripts/generate_pass.sh
# ... other databases ...
# Generate passwords for OpenLDAP
credential_add_from_command "openldap-admin" /opt/scripts/generate_pass.sh
credential_add_from_command "openldap-config" /opt/scripts/generate_pass.sh
# Generate encryption keys
credential_add_from_command "identity-encoder-salt" /opt/scripts/generate_encryption_key_base64.sh 32
credential_add_from_command "project-encryption-key" /opt/scripts/generate_encryption_key_base64.sh 48
credential_add_from_command "encryption-keys.json" bash -c 'echo "{"primary":1,"keys":[{"version":1,"value":"$(/opt/scripts/generate_encryption_key_base64.sh 32)"}]}"'
credential_add_from_command "key" /opt/scripts/generate_encryption_key.sh 48
credential_add_from_command "rsaKey" /opt/scripts/generate_rsa_encryption_key.sh 2048
# RabbitMQ configuration
credential_add_from_command "rabbitmq" /opt/scripts/generate_pass.sh
credential_add_from_command "rabbitmqConfig" /opt/scripts/generate_rmq_config.sh "$(credential_get "rabbitmq")"
credential_add_from_command "rabbitmq-erlang-cookie" /opt/scripts/generate_pass.sh
This script generates various types of credentials:
- Database passwords – random alphanumeric strings
- Base64-encoded encryption keys – used for encoding sensitive data
- RSA keys – used for signing JWT tokens
- RabbitMQ configuration – including erlang cookie, which is critical for clustering
The credential_add_from_command
function calls a specified command and adds its result to the secret:
credential_add_from_command() {
local key=$1
shift
if [ "$1" == "--force" ]; then
shift
elif credential_exists "$key"; then
return 0
fi
value=$("$@" | base64 -w 0)
kubectl patch secret db-credentials -n prelude --type=json -p="[{"op":"add", "path":"/data/$key", "value":"$value"}]"
}
- Save Generated Credentials – The
credentials_save
function saves the updated secret back to the configuration object:
credentials_save() {
secrets=$(kubectl get secret db-credentials -n prelude -o yaml | base64)
crdssecret=$(echo $secrets | tr -d 'n')
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/crdssecret", "value": "'"$crdssecret"'"}]'
}
- Load Database Settings – The script calls
helm-upstall db-settings
, which creates a ConfigMap with settings for databases:
helm-upstall db-settings "" "${NAMESPACE_PRELUDE}"
After preparing credentials, the script generates SSH keys for PostgreSQL and saves them as a Kubernetes secret:
# Generate SSH keys for PostgreSQL
SSH_DIR=$(mktemp -d /tmp/ssh-keys-XXXXXXXX)
ssh-keygen -N "" -f $SSH_DIR/id_rsa
kubectl -n ${NAMESPACE_PRELUDE} create secret generic postgres-ssh
--from-literal=private-key=$(base64 -w 0 $SSH_DIR/id_rsa)
--from-literal=public-key=$(base64 -w 0 $SSH_DIR/id_rsa.pub)
rm -rf $SSH_DIR
This code fragment:
- Creates a temporary directory
- Generates an SSH key pair without a password (
-N ""
) - Creates a
postgres-ssh
secret with the encoded key pair - Removes the temporary directory, minimizing risk of key exposure
Next, the script deploys services using Helm. An important aspect is parallel execution of these operations, which significantly speeds up the deployment process:
set -x
# Set directory for installation statuses
export UPSTALL_STATUS_DIR=/tmp/deploy_$(date +%Y%m%d%H%M%S)
mkdir -p $UPSTALL_STATUS_DIR
# Parallel service deployment using Helm
helm-upstall endpoint-secrets "INGRESS_URL=${INGRESS_URL},INGRESS_CERT=${INGRESS_CERT},NODE_NAMES=${NODE_NAMES}" "$NAMESPACE_PRELUDE" &
helm-upstall no-license "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &
helm-upstall rabbitmq-ha "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &
This code fragment performs the following tasks:
Enable Command Display – The
set -x
instruction restores command display mode since credential operations have been completed.Create Status Directory – The script creates a
/tmp/deploy_YYYYMMDDHHMMSS
directory, which will be used to monitor the status of parallel Helm installations.Parallel Service Deployment – The
helm-upstall
function is called three times, with an&
symbol at the end, meaning background (asynchronous) execution:
endpoint-secrets
– endpoint secret configurationno-license
– licensing servicerabbitmq-ha
– RabbitMQ cluster with high availability
The helm-upstall
function wraps the call to the helm-upstall
script, which combines helm upgrade
and helm install
operations:
helm-upstall() {
# ... initialization code ...
/opt/scripts/helm-upstall --namespace="$3" --release-name="$release_name" --chart-path="$service_name" --set-string="$2" --set="$4" --timeout="$6" $5 || result=$?
# ... result handling code ...
}
This function ensures idempotence of the deployment operation – it will work correctly regardless of whether a given Helm chart was previously installed or not.
The CHECK_DIR=${UPSTALL_STATUS_DIR}
parameter causes the operation status to be saved to a file, allowing later checking if all parallel installations completed successfully:
check-helm-upstalls-status() {
trap "clear-helm-upstalls-status $1" RETURN
# Check if there are failed Helm install/upgrade operations
check_files_count=$(cat $1/*.check | wc -l)
failure_count=$(cat $1/*.check | grep 1 | wc -l)
echo Failure counts are ${failure_count}, from $check_files_count finished
if [ "${failure_count}" -gt "0" ]; then
log_stage "There are failed install/upgrade of helm releases"
return 1
fi
return 0
}
This function:
- Counts the number of status files and the number of failures
- If there are failures, it reports this and returns an error code
- Otherwise, it returns success
This entire phase of credential management and service deployment is crucial for the security and functionality of the VMware Aria Automation platform:
- It ensures secure storage and distribution of credentials
- It generates unique, cryptographically strong passwords and keys
- It saves credentials as Kubernetes secrets
- It efficiently deploys services through parallel execution
- It monitors deployment status to ensure reliability
This advanced implementation guarantees that:
- Each deployment has unique, strong credentials
- Credentials are securely stored
- The deployment process is idempotent and reliable
- Deployment is time-efficient due to parallel task execution
10. Advanced Identity Configuration and Authentication Mechanisms
After the phase of deploying basic infrastructure services, the deploy.sh script proceeds to configure the identity system and authentication mechanisms. This section is crucial for the security of the entire platform as it establishes the foundations of authorization and access control.
set +x
if output=$(vracli vidm); then
identity_profile=vidm
admin_client_id=$(echo "${output}" | jq -r '.clients|.ClientID')
admin_client_secret=$(echo "${output}" | jq -r '.clients|.ClientSecret')
vidm_client_id_user=$(echo "${output}" | jq -r '.clients|.ClientIDUser')
org_owner=$(echo "${output}" | jq -r '.user')
# Storing the prelude clients which will be used later in generate_client_ids.sh
PRELUDE_CLIENTS="$admin_client_id,$vidm_client_id_user"
elif ldap=$(kubectl get vaconfigs.prelude.vmware.com prelude-vaconfig -o json | jq -e .spec.ldap); then
echo "
#####################################################
# LDAP deployments are not meant for production use #
# and are not supported in HA environments! #
#####################################################
"
identity_profile=ldap
admin_client_id=$(echo $ldap | jq -r ".client_id")
admin_client_secret=$(echo $ldap | jq -r ".client_secret")
org_owner=$(echo $ldap | jq -r ".default_org_owner")
PRELUDE_CLIENTS="$admin_client_id"
else
echo "No vIDM configuration has been provided!"
exit 1
fi
set +e
This code fragment implements intelligent detection of the available identity management system and automatically retrieves necessary authentication data:
Disable Command Display – The
set +x
instruction prevents displaying sensitive information such as client IDs and secret keys.vIDM Detection and Configuration – The script first checks if vIDM (VMware Identity Manager) is configured by calling
vracli vidm
:
- If the operation succeeds, it extracts client IDs and secret keys from the result
- It also records the organization owner username
- Initializes
PRELUDE_CLIENTS
as a list containing the administrator client ID and user ID
- Alternatively, LDAP Detection – If vIDM is not configured, the script checks if LDAP configuration is available:
- Displays a warning that LDAP deployments are not intended for production use
- Gets the client ID and secret key from LDAP configuration
- Records the default organization owner name
- Initializes
PRELUDE_CLIENTS
with only the administrator client ID
Handling Case of Missing Configuration – If neither vIDM nor LDAP is configured, the script exits with an error, informing that no vIDM configuration was provided.
Disable Strict Error Mode – The
set +e
instruction disables automatic script termination on errors, which is needed in subsequent operations that may normally return non-zero exit codes.
After detecting the identity system and retrieving credentials, the script uses this information to generate OAuth client IDs for all components:
/opt/scripts/generate_client_ids.sh "$PRELUDE_CLIENTS"
The generate_client_ids.sh
script is a complex script that:
- Processes templates from the
/opt/charts/client-secrets/templates/
directory - Generates or retrieves client IDs for each service
- Creates a ConfigMap containing a list of all managed clients
csp_generate_client_ids() {
local identity_managed_clients=
local cached_client_ids=,${CLIENT_SECRETS_VALUES}
for file in /opt/charts/client-secrets/templates/*; do
# Skip files that are not service templates
if [[ "$file" == *"csp-fixture-job.yaml" || "$file" == *"dependencies.yaml" || "$file" == *"NOTES.txt" || "$file" == *"_helpers.tpl" ]]; then
continue
}
# Get service name and client ID prefix from template
service_name=$(cat $file | grep -A 2 "metadata" | grep "name" | cut -d':' -f2 | sed -e 's/^[[:space:]]*//')
service_client_id_prefix=$(cat $file | grep -A 2 "data" | grep "clientid" | cut -d':' -f2 | sed -e 's/^[[:space:]]*//' | cut -d'{' -f1)
merged_service_name=$(echo $service_name | sed 's/-//g')
# Check if ID is already cached
if [[ "$cached_client_ids" == *",$merged_service_name="* ]]; then
# Get existing client ID
clientid=$(kubectl -n prelude get configmaps "$service_name" -o json | jq -r '.data.clientid')
redirect_uri=$(kubectl -n prelude get configmaps "$service_name" -o json | jq -r '.data.redirecturi')
# ... update redirect URI if changed ...
else
# Generate new client ID
random_client_id_suffix=$(tr -dc 'a-zA-Z0-9' < /dev/urandom | head -c 16)
clientid="${service_client_id_prefix}${random_client_id_suffix}"
service_suffix_helm_key="${merged_service_name}clientsuffix"
CLIENT_SECRETS_VALUES="${CLIENT_SECRETS_VALUES},${merged_service_name}=${random_client_id_suffix}"
fi
# Add client ID to managed clients list
identity_managed_clients+=", $clientid"
done
# Add predefined Prelude clients to the list
if [[ "$PRELUDE_CLIENTS" && "$PRELUDE_CLIENTS" != ',' ]]; then
IFS=, read -r -a prelude_clients_array <<< "$PRELUDE_CLIENTS"
for client in "${prelude_clients_array[@]}"; do
if [[ -n "$client" ]] && [[ "$client" != ' ' ]]; then
identity_managed_clients+=", $client"
fi
done
fi
# Create ConfigMap with client list
kubectl -n "$NAMESPACE_PRELUDE" create configmap identity-clients
--from-literal=clients="${identity_managed_clients:2}" --dry-run=client -o yaml | kubectl apply -f -
}
This advanced function:
- Iterates through template files in
/opt/charts/client-secrets/templates/
- For each template:
- Extracts service name and client ID prefix
- Checks if the client ID has already been generated and saved (cached)
- If so, retrieves it from ConfigMap
- If not, generates a new, random suffix and creates the full client ID
- Updates the
CLIENT_SECRETS_VALUES
variable with the new value
- Adds all generated client IDs to the
identity_managed_clients
list - Also adds predefined client IDs from
PRELUDE_CLIENTS
- Creates the
identity-clients
ConfigMap containing a list of all clients
After generating client IDs, the deploy.sh script deploys all identity-related components:
log_stage "Deploying identity services"
# Deploy identity service charts
helm-upstall client-secrets "$CLIENT_SECRETS_VALUES" "$NAMESPACE_PRELUDE" && wait_release client-secrets
# Deploy the identity service if using LDAP (in vIDM case, identity service is not needed)
if [ "$identity_profile" = "ldap" ]; then
helm-upstall identity "$VALUES" "$NAMESPACE_PRELUDE" && wait_release identity
# Deploy openldap if using LDAP
helm-upstall openldap "$VALUES" "$NAMESPACE_PRELUDE" && wait_release openldap
fi
This code fragment:
- Deploys the
client-secrets
chart, which contains OAuth client IDs - In the case of using LDAP, also deploys the identity service (
identity
) and OpenLDAP - Uses the
wait_release
function to wait until each chart is successfully deployed
The wait_release
function monitors Helm release status:
wait_release() {
local release=$1
local timeout=${2:-300} # default 5 minutes
echo "Waiting for release $release to be ready..."
local start_time=$(date +%s)
while true; do
local status=$(helm status -n $NAMESPACE_PRELUDE $release -o json 2>/dev/null | jq -r '.info.status' 2>/dev/null)
if [[ "$status" == "deployed" ]]; then
echo "Release $release is ready"
return 0
fi
local current_time=$(date +%s)
local elapsed=$((current_time - start_time))
if [[ $elapsed -gt $timeout ]]; then
echo "Timeout waiting for release $release to be ready"
return 1
fi
sleep 5
done
}
After deploying identity services, the script registers endpoints for key components such as vRealize Orchestrator (vRO) and Action Based Extensibility (ABX):
log_stage "Registering service endpoints"
if [[ "$identity_profile" = "vidm" ]]; then
/opt/scripts/register_vro_endpoint.sh
if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
/opt/scripts/register_abx_endpoint.sh
fi
fi
The register_vro_endpoint.sh
script contains advanced logic for detecting and configuring the vRO endpoint:
create_or_update_vro() {
source /opt/scripts/retry_utils.sh
set -o pipefail
retry_backoff "5 15 45" "Failed to load existing vRO config" "load_existing_config"
if [ ! -z "${CURRENT// }" ]
then
retry_backoff "5 15 45" "Failed to update existing vRO config" "update_existing"
else
retry_backoff "5 15 45" "Failed to register vRO" "register_vro"
fi
set +o pipefail
}
This function:
- Uses the
retry_backoff
mechanism to retry operations in case of temporary problems - Tries to load existing vRO configuration
- If configuration exists, updates it
- Otherwise registers a new endpoint
Similarly, the register_abx_endpoint.sh
script registers an endpoint for the Action Based Extensibility service:
create_or_update_abx() {
source /opt/scripts/retry_utils.sh
set -o pipefail
retry_backoff "5 15 45 135 405 1215" "Failed to query number of ABX endpoints" "count_abx_endpoints"
if [ "$ABX_ENDPOINTS_COUNT" -gt "0" ]
then
retry_backoff "5 15 45 135 405 1215" "Failed to update existing ABX config" "update_existing"
else
retry_backoff "5 15 45 135 405 1215" "Failed to register ABX endpoint" "register_abx_endpoint"
fi
set +o pipefail
}
This entire identity and authentication configuration phase:
- Automatically detects the available identity management system (vIDM or LDAP)
- Retrieves necessary credentials and client IDs
- Generates unique OAuth client IDs for all components
- Deploys identity and authentication related services
- Registers endpoints for key components
Thanks to these mechanisms, the deploy.sh script ensures:
- A consistent and secure identity and authentication system
- Uniqueness of OAuth client IDs
- Secure storage and distribution of credentials
- Proper endpoint registration for service integration
- Resilience to temporary problems through retry mechanisms
This comprehensive implementation is the foundation of security and integration of all VMware Aria Automation platform components.
11. Endpoint Registration and Specialized Component Configuration
After configuring the basic infrastructure and identity system, the deploy.sh script proceeds to register endpoints and configure specialized components. This phase includes registering endpoints for components such as vRealize Orchestrator (vRO) and Action Based Extensibility (ABX), as well as configuring other key specialized services.
log_stage "Registering service endpoints"
if [[ "$identity_profile" = "vidm" ]]; then
/opt/scripts/register_vro_endpoint.sh
if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
/opt/scripts/register_abx_endpoint.sh
fi
fi
This code fragment makes endpoint registration dependent on the previously detected identity profile (vIDM or LDAP) and configuration:
Conditional vRO Registration – The script calls
register_vro_endpoint.sh
only in the case of vIDM configuration, as vRO integration requires this system.Conditional ABX Registration – Additionally, if extension support is enabled (
ENABLE_EXTENSIBILITY_SUPPORT
), the script also registers the ABX endpoint.
The register_vro_endpoint.sh
script contains advanced logic for detecting, building a host filter, and configuring the vRO endpoint:
#!/bin/bash
# Verify $CSP_AUTH_TOKEN defined
: ${CSP_AUTH_TOKEN:?}
# Verify INGRESS_URL defined
: ${INGRESS_URL:?}
PROVISIONING_URL="http://provisioning-service.prelude.svc.cluster.local:8282"
CERT="$(vracli certificate load-balancer --list || vracli certificate ingress --list)"
CERT_JSON=$(jq --null-input --compact-output --arg str "$CERT" '$str')
build_host_filter() {
local nodeList=$(kubectl get nodes -o jsonpath='{.items[*].metadata.name}')
declare -a nodeArray=($nodeList)
local searchQuery="(nameeqembedded-VRO)or(endpointProperties.hostNameeq$INGRESS_URL:443)"
for nodeName in "${nodeArray[@]}"; do
if [ "${nodeName}" == "${FQDN}" ]
then
# single node
echo "($searchQuery)"
return
else
searchQuery+="or(endpointProperties.hostNameeqhttps://$nodeName:443)"
fi
done
echo "($searchQuery)"
}
load_existing_config() {
local hostFilter=$(build_host_filter)
CURRENT=$(curl -k -f $PROVISIONING_URL"/provisioning/mgmt/endpoints?expand&external&$filter=((endpointTypeeqvro)and(customProperties.vroAuthTypeeqCSP)and$hostFilter)"
-H 'Authorization: Bearer '$CSP_AUTH_TOKEN
-H 'Cookie: csp-auth-token='$CSP_AUTH_TOKEN
|jq '.documents | .[] | .endpointProperties.certificate |= '"${CERT_JSON}"'| .endpointProperties.hostName |= "'$INGRESS_URL':443"')
}
update_existing() {
curl -k -f -X PUT $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external
-H 'Content-Type: application/json'
-H 'Authorization: Bearer '$CSP_AUTH_TOKEN
-H 'Cookie: csp-auth-token='$CSP_AUTH_TOKEN
-d "${CURRENT}"
}
register_vro() {
curl -k -f $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external
-H 'Content-Type: application/json'
-H 'Authorization: Bearer '$CSP_AUTH_TOKEN
-H 'Cookie: csp-auth-token='$CSP_AUTH_TOKEN
-d '{"endpointProperties":{"hostName":"'$INGRESS_URL':443","dcId":"0","privateKeyId":"vcoadmin","privateKey":"vcoadmin","certificate":'"${CERT_JSON}"',"acceptSelfSignedCertificate":true,"vroAuthType":"CSP"},"customProperties":{"isExternal":"true"},"endpointType":"vro","associatedEndpointLinks":[],"name":"embedded-VRO","tagLinks":[]}'
}
create_or_update_vro() {
source /opt/scripts/retry_utils.sh
set -o pipefail
retry_backoff "5 15 45" "Failed to load existing vRO config" "load_existing_config"
if [ ! -z "${CURRENT// }" ]
then
retry_backoff "5 15 45" "Failed to update existing vRO config" "update_existing"
else
retry_backoff "5 15 45" "Failed to register vRO" "register_vro"
fi
set +o pipefail
}
# Main execution
create_or_update_vro
This complex script performs the following tasks:
Required Variable Verification – Checks if variables
CSP_AUTH_TOKEN
andINGRESS_URL
are set, which is necessary for proper registration.Certificate Retrieval – Gets the load-balancer or ingress certificate and formats it as JSON.
Host Filter Building – The
build_host_filter
function dynamically creates a search filter that includes:
- An endpoint named “embedded-VRO”
- Hosts with the ingress URL address
- All cluster nodes (for multi-node environments)
Loading Existing Configuration – The
load_existing_config
function checks if the vRO endpoint already exists, using the built filter.Update or Registration – Depending on the check result, the script either updates the existing endpoint (
update_existing
) or registers a new one (register_vro
).Retry Mechanism – All operations use
retry_backoff
with various delays to handle temporary problems.
Similarly, the register_abx_endpoint.sh
script registers an endpoint for Action Based Extensibility:
#!/bin/bash
# Verify $CSP_AUTH_TOKEN defined
: ${CSP_AUTH_TOKEN:?}
source /opt/scripts/csp_functions.sh
PROVISIONING_URL="http://provisioning-service.prelude.svc.cluster.local:8282"
# Define OpenFaaS properties
OPENFAAS_ADDRESS="http://gateway.openfaas.svc.cluster.local:8080"
echo "OpenFaaS address: "${OPENFAAS_ADDRESS}
count_abx_endpoints() {
ABX_ENDPOINTS_COUNT=$(curl -k -f $PROVISIONING_URL'/provisioning/mgmt/endpoints?enumerate&external&$filter=(endpointTypeeqabx.endpoint)'
-H 'Content-Type: application/json'
-H 'Authorization: Bearer '$CSP_AUTH_TOKEN | jq .totalCount)
}
register_abx_endpoint() {
curl -k -f $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external
-H 'Content-Type: application/json'
-H 'Authorization: Bearer '$CSP_AUTH_TOKEN
-d '{"endpointProperties":{"apiEndpoint":"'$OPENFAAS_ADDRESS'","privateKeyId":"","privateKey":""},"customProperties":{"isExternal":"true"},"endpointType":"abx.endpoint","associatedEndpointLinks":[],"name":"embedded-ABX-onprem","tagLinks":[]}'
}
create_or_update_abx() {
source /opt/scripts/retry_utils.sh
set -o pipefail
retry_backoff "5 15 45 135 405 1215" "Failed to query number of ABX endpoints" "count_abx_endpoints"
if [ "$ABX_ENDPOINTS_COUNT" -gt "0" ]
then
retry_backoff "5 15 45 135 405 1215" "Failed to update existing ABX config" "update_existing"
else
retry_backoff "5 15 45 135 405 1215" "Failed to register ABX endpoint" "register_abx_endpoint"
fi
set +o pipefail
}
# Main execution
create_or_update_abx
This script:
- Defines the OpenFaaS address (function engine used by ABX)
- Checks if the ABX endpoint already exists using
count_abx_endpoints
- Depending on the result, updates the existing or registers a new endpoint
- Uses the
retry_backoff
mechanism with an aggressive retry scheme (up to 1215 seconds pause)
After registering endpoints, the deploy.sh script deploys other specialized components, such as VMware Event Broker Appliance (VEBA) and analytics services:
log_stage "Deploying specialized components"
# Deploy VEBA if extensibility support is enabled
if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
helm-upstall veba "$VALUES,extensibilityEnabled=true" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
fi
# Deploy analytics components if enabled
if [[ "$ENABLE_ANALYTICS" == "true" ]]; then
helm-upstall analytics-collector "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
helm-upstall analytics-service "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
fi
This code fragment shows a “feature flags” approach, where specialized components are deployed only when appropriate flags (like ENABLE_EXTENSIBILITY_SUPPORT
or ENABLE_ANALYTICS
) are enabled.
The next stage is configuring the organization alias, which is key for multi-tenant environments:
log_stage "Configuring organization alias"
if [[ "$identity_profile" = "vidm" ]]; then
source /opt/scripts/vidm_functions.sh
source /opt/scripts/csp_functions.sh
# Set up variables for CSP
DEFAULT_ORG_NAME=$(get_default_tenant_name)
DEFAULT_ORG_ALIAS=$(get_default_tenant_alias "$admin_token")
# Update organization alias in identity service
csp_auth "$admin_client_id" "$admin_client_secret"
csp_retrieve_orgs
patch_identity_with_default_org_alias
fi
This fragment:
- Loads functions for handling vIDM and CSP
- Gets the name and alias of the default organization (tenant)
- Authenticates with CSP using administrator client ID and secret
- Retrieves organization information
- Updates the organization alias in the identity service
The patch_identity_with_default_org_alias
function implements intelligent alias updating:
patch_identity_with_default_org_alias () {
local is_alias_updated=$(kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.vidm.isDefaultOrgAliasUpdated')
if [ "$is_alias_updated" = true ]; then
timestamped_echo "The default organization alias is already updated in identity service."
return 0
fi
local alias=$(kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.vidm.defaultOrgAlias //empty')
if [ -z "$alias" ]; then
timestamped_echo "Default organization doesn't have an alias."
else
timestamped_echo "Updating the default organization alias in identity service."
identity_patch_alias "$alias"
fi
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/vidm/isDefaultOrgAliasUpdated", "value": true}]'
}
This function:
- Checks if the alias has already been updated
- If not, gets the alias from configuration
- Updates the alias in the identity service
- Sets the
isDefaultOrgAliasUpdated
flag to true to avoid multiple updates
This entire phase of endpoint registration and specialized component configuration ensures:
- Integration of key components, such as vRO and ABX, with the identity system
- Conditional deployment of specialized components (VEBA, analytics)
- Proper configuration of organization aliases for multi-tenant environments
- Resilience to temporary problems through retry mechanisms
Thanks to these mechanisms, the deploy.sh script ensures comprehensive configuration and integration of all specialized components, which is key to the full functionality of the VMware Aria Automation platform.
12. Service Toggling and State Management
After completing the deployment and configuration phase of individual components, the deploy.sh script proceeds to the key stage of service toggling, which ensures that all services are enabled and configured according to requirements. This phase is essential because some services may require special startup or configuration after deployment.
log_stage "Toggling services"
# Store list of enabled services in vaconfig
/opt/scripts/store_enabled_svc.sh
# Get service states from vaconfig
STATES=$(kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.services.states // "{}"')
# Toggle services with appropriate parameters
/opt/scripts/toggle_services.sh "$VALUES" "$APP_SELECTOR" "$TARGET_PORT" "$EXTRA_VALUES" "$STATES" "" "true" "$UPSTALL_STATUS_DIR"
This code fragment performs several key tasks:
- Saving List of Enabled Services – The
store_enabled_svc.sh
script collects information about services that should be enabled and saves it in the configuration object:
#!/bin/bash
set -eu
SERVICES_TO_TOGGLE="$(cat /etc/vmware-prelude/services.list) $(/opt/scripts/capsvc_enabled.sh)"
SERVICES_TO_TOGGLE="${SERVICES_TO_TOGGLE// /$'n'}"
SERVICES_TO_TOGGLE=$(echo "$SERVICES_TO_TOGGLE" | sort -u )
CAP_DISABLED_SERVICES="$(/opt/scripts/capsvc_disabled.sh)"
if [[ -n "$CAP_DISABLED_SERVICES" ]]; then
CAP_DISABLED_SERVICES=(${CAP_DISABLED_SERVICES// / })
for svc in "${CAP_DISABLED_SERVICES[@]}"
do
# remove the services, which should not be toggled, based on capability
SERVICES_TO_TOGGLE=$(echo "$SERVICES_TO_TOGGLE" | sed "/^$svc$/d")
done
fi
SERVICES_TO_TOGGLE="${SERVICES_TO_TOGGLE//$'n'/ }"
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/services/enabled-services", "value": "'"$SERVICES_TO_TOGGLE"'"}]' || true
This script:
- Combines services from the
services.list
file with services returned bycapsvc_enabled.sh
- Removes duplicates and sorts the list
- Removes services that should be disabled (from
capsvc_disabled.sh
) - Saves the final list in the vaconfig object under the path
/spec/services/enabled-services
- Getting Service States – The script retrieves information about service states from the configuration object:
STATES=$(kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.services.states // "{}"')
This command extracts a JSON object containing current states of all services or returns an empty object {}
if state information is not available.
- Service Toggling – The most important stage is calling the
toggle_services.sh
script, which handles actual service toggling:
/opt/scripts/toggle_services.sh "$VALUES" "$APP_SELECTOR" "$TARGET_PORT" "$EXTRA_VALUES" "$STATES" "" "true" "$UPSTALL_STATUS_DIR"
This script accepts a number of parameters:
$VALUES
– configuration value string$APP_SELECTOR
– application selector$TARGET_PORT
– target port$EXTRA_VALUES
– additional values$STATES
– JSON object with service state information""
– empty string (exclusion of specific services)"true"
– force service restart$UPSTALL_STATUS_DIR
– directory for monitoring status
The toggle_services.sh
script contains advanced logic for toggling services:
#!/bin/bash
set -e
set -x
VALUES=$1
APP_SELECTOR=$2
TARGET_PORT=$3
TOGGLES=$4
STATES=$5
EXCLUDE_SERVICES=$6
FORCE_SERVICES_RESTART=$7
UPSTALL_STATUS_DIR=$8
if [[ -z $STATES ]]; then
STATES="{}"
fi
source /opt/scripts/helm_utils.sh
cd /opt/charts
export -f helm-toggle-state
export -f helm-upstall
export -f do-helm-upstall
source /opt/scripts/retry_utils.sh
export -f retry_backoff
allServices=$(kubectl get vaconfig prelude-vaconfig -o json | jq -ej '.spec.services."enabled-services"' | sed 's/s+/n/g')
if [[ -n "$EXCLUDE_SERVICES" ]]; then
services_to_excludeList=(${EXCLUDE_SERVICES//,/ })
for i in "${services_to_excludeList[@]}"
do
SERVICES_TO_TOGGLE=$(echo "$allServices" | sed "/^$i$/d")
done
fi
SERVICES_TO_TOGGLE="${allServices//$'n'/ }"
for svc in ${allServices[@]}
do
if [[ $(jq "has("$svc")" <<< "$STATES") == false ]]; then
STATES="$(jq ". + {"$svc": true}" <<< "$STATES")"
fi
done
echo "$allServices" | xargs -t -n 1 -P 0 -I % bash -c "helm-toggle-state % '$VALUES' '$NAMESPACE_PRELUDE' '$APP_SELECTOR' '$TARGET_PORT' '$TOGGLES' '$STATES' '$
FORCE_SERVICES_RESTART' '$UPSTALL_STATUS_DIR'"
This extensive script performs the following operations:
Variable Initialization – Gets parameters passed from the main deploy.sh script.
Loading Helper Modules – Uses functions from
helm_utils.sh
andretry_utils.sh
for Helm chart management and retry mechanisms.Getting Service List – Extracts the list of services to be toggled from the configuration object.
Handling Exclusions – If a list of services to exclude was passed, removes them from the list of services to toggle.
State Initialization – For each service that doesn’t have a defined state in the
STATES
object, adds a default state oftrue
.Parallel Service Toggling – Uses the
xargs
command with the-P 0
option (no process limit) to run thehelm-toggle-state
function in parallel for each service.
The helm-toggle-state
function is responsible for actual toggling of individual services:
helm-toggle-state() {
local name=$1
local values=$2
local namespace=$3
local app_selector=$4
local target_port=$5
local toggles=$6
local states=$7
local force_services_restart=$8
local upstall_status_dir=$9
local extra_values=""
if echo $states | jq -e '."'$name'"' > /dev/null; then
extra_values="disable=false,$toggles"
else
extra_values="disable=true,service.selector.app=$app_selector,service.port.targetPort=$target_port,$toggles"
fi
force_reinstall_flag=""
if [[ "$force_services_restart" == "true" ]]; then
force_reinstall_flag="--force-reinstall"
fi
local upstall_status_dir_arg=""
if [ "$upstall_status_dir" != "" ]; then
upstall_status_dir_arg="CHECK_DIR=${upstall_status_dir}"
fi
helm-upstall "$name" "$values" "$namespace" "$extra_values" "$force_reinstall_flag" "$upstall_status_dir_arg"
}
This function:
- Analyzes whether the service should be enabled or disabled based on the
states
object - Sets appropriate values for Helm (
disable=false
ordisable=true
) - Adds the
--force-reinstall
flag if service restart is required - Calls the
helm-upstall
function with appropriate parameters
After toggling services, the script monitors their state:
log_stage "Waiting for services to start"
# Check status of all services
timeout 300s bash -c 'until vracli service status | grep -v "Running|Disabled|N/A"; do echo "Waiting for services..."; sleep 10; done'
# Verify that all required services are running
required_services="identity-service rabbitmq-ha orchestrator-service"
for service in $required_services; do
if ! vracli service status | grep "$service" | grep -q "Running"; then
log_stage "Service $service is not running. Deployment failed."
exit 1
fi
done
This fragment:
- Waits up to 300 seconds until all services reach the “Running”, “Disabled”, or “N/A” state
- Checks if key services (identity-service, rabbitmq-ha, orchestrator-service) are in the “Running” state
- If any of the key services is not running, ends the deployment with an error
The service toggling phase is key for several reasons:
- Activation Control – Allows selectively enabling or disabling services as needed
- Service Restart – Enables forcing service restart, which may be necessary after configuration changes
- Operation Verification – Ensures that all required services are active and working properly
- Parallel Execution – Speeds up the process through parallel operation execution
- Error Handling – Implements detection and reporting of service problems
Thanks to these mechanisms, the deploy.sh script ensures that all necessary services are properly started and configured, which is a key condition for the operation of the VMware Aria Automation platform.
13. Deployment Finalization and Ready State Setting
The final phase of the deploy.sh script includes operations finalizing the deployment, including cleaning of temporary resources, resetting database migration locks, setting the readiness flag, and notifying the user of successful deployment completion. This phase is crucial to ensure that the system is fully operational and ready to use.
# Force generation of new service status
vracli service status --ignore-cache || true
log_stage "Clearing liquibase locks"
vracli reset liquibase --confirm
# Set the deploy ready state and update generation for liagent lcc action
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/deploy/ready", "value": true},
{"op": "replace", "path": "/spec/deploy/generation", "value": "'"$(date +%s)"'"}]'
# Final cleanup and success message
clear || true
echo
echo "Prelude has been deployed successfully"
echo
This code fragment performs several important tasks:
1. Forcing Generation of Current Service Status
vracli service status --ignore-cache || true
This command:
- Calls
vracli service status
with the--ignore-cache
option, which forces skipping the cache and retrieving the status of all services again - The
|| true
operator ensures that the script will continue even if this command returns an error
Forcing service status refresh is important because:
- It ensures the administrator receives the most up-to-date information about system state
- It initializes the internal service status cache, which will speed up subsequent operations
- It verifies that all services have been properly started
2. Clearing Liquibase Locks
log_stage "Clearing liquibase locks"
vracli reset liquibase --confirm
Liquibase is a tool used by many VMware Aria Automation components to manage database schema migrations. During migration, Liquibase establishes locks to prevent concurrent migrations that could damage the schema.
The vracli reset liquibase --confirm
command:
- Removes all Liquibase locks from databases
- Requires confirmation (
--confirm
) to prevent accidental execution - Is crucial if a previous deployment was interrupted or ended with an error, which could leave locks behind
Clearing Liquibase locks is essential because:
- Remaining locks can prevent future schema migrations
- They can cause errors when starting services that use the migration mechanism
- They ensure a clean state for future updates and configuration changes
3. Setting the Deployment Ready Flag
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/deploy/ready", "value": true},
{"op": "replace", "path": "/spec/deploy/generation", "value": "'"$(date +%s)"'"}]'
This command updates the prelude-vaconfig
configuration object using a PATCH operation, making two changes:
- Sets the
/spec/deploy/ready
flag totrue
, signaling that deployment has completed and the system is ready to use - Updates the
/spec/deploy/generation
value to the current Unix timestamp (number of seconds since January 1, 1970), which allows identifying the deployment version
Setting the readiness flag is crucial because:
- It informs other system components that deployment has succeeded
- It allows liagent agents to perform LCC (Life Cycle Configuration) actions
- It serves as a reference point for monitoring and management tools
- It provides a consistent way to check deployment state
4. Final Cleanup and Success Message
clear || true
echo
echo "Prelude has been deployed successfully"
echo
These commands:
- Clear the screen (
clear
), to provide a good view of the final message (the|| true
operator ensures that the script will continue even ifclear
fails) - Display a clear message about successful deployment
The success message is important because:
- It gives the user unambiguous confirmation that deployment has succeeded
- It constitutes a clear end to the deployment process
- It serves as a reference point in logs
This entire finalization phase ensures that:
- The system is in a ready-to-use state
- There are no remaining locks or unfinished operations
- The configuration is properly updated with the readiness flag
- The user receives clear confirmation of success
It’s worth noting that the script doesn’t end immediately after the success message but returns to the on_exit
function registered at the beginning via trap on_exit EXIT
. This function performs final cleanup operations:
on_exit() {
# ... error handling code ...
# Remove temporary helm_upstall check directory
if [ -n "$UPSTALL_STATUS_DIR" ]; then
clear-helm-upstalls-status $UPSTALL_STATUS_DIR true
fi
# Clear the value of property cache.timeout in vracli.conf file
# Do not generate new service status
vracli service status --unset-config service.status.cache.lifetime || true
rm -rf /tmp/deploy.tmp.*
}
These operations ensure that:
- Temporary Helm status files are removed
- Service status cache configuration is restored to default values
- All temporary files are cleaned up
Thanks to this comprehensive finalization phase, the deploy.sh script ensures that the system is left in a clean, consistent, and ready-to-use state, which is key to stable operation of the VMware Aria Automation platform.
Security and Advanced Credential Management
The deploy.sh script implements an extensive security and credential management system that is fundamental to ensuring protection of the entire VMware Aria Automation platform. This section analyzes how the script generates, stores, and distributes different types of credentials and how it ensures secure communication between components.
Password Generation and Management
One of the key aspects of platform security is generating strong, unique passwords for various components. The script uses specialized tools to create such credentials:
credential_add_from_command "postgres" /opt/scripts/generate_pass.sh
credential_add_from_command "redis" /opt/scripts/generate_pass.sh
credential_add_from_command "lemans-resources-db" /opt/scripts/generate_pass.sh
The credential_add_from_command
function executes the given command and adds its result as a credential for a specified key:
credential_add_from_command() {
local key=$1
shift
if [ "$1" == "--force" ]; then
shift
elif credential_exists "$key"; then
return 0
fi
value=$("$@" | base64 -w 0)
kubectl patch secret db-credentials -n prelude --type=json -p="[{"op":"add", "path":"/data/$key", "value":"$value"}]"
}
This function:
- Checks if the credential already exists (unless the
--force
option is used) - Calls the specified password-generating command
- Base64 encodes the result
- Adds the encoded value to the
db-credentials
Kubernetes secret
The generate_pass.sh
script creates a 32-character random alphanumeric string:
#!/bin/bash
tr -dc 'a-zA-Z0-9' < /dev/urandom | head -c 32
This password generator:
- Uses the
/dev/urandom
device as an entropy source, ensuring cryptographic quality randomness - Filters the stream, keeping only alphanumeric characters
- Takes the first 32 characters, giving a 32-character password
- Contains no predictable patterns or constant values
This approach ensures that each deployment has unique, strong passwords, which is key to platform security.
Generating Different Types of Cryptographic Keys
Besides passwords, the script generates various types of cryptographic keys that are used for different purposes:
# EncoderSalt key (32 bytes in base64)
credential_add_from_command "identity-encoder-salt" /opt/scripts/generate_encryption_key_base64.sh 32
# Encryption key (48 bytes)
credential_add_from_command "key" /opt/scripts/generate_encryption_key.sh 48
# RSA key for JWT (2048 bits)
credential_add_from_command "rsaKey" /opt/scripts/generate_rsa_encryption_key.sh 2048
# JSON objects with keys
credential_add_from_command "encryption-keys.json" bash -c 'echo "{"primary":1,"keys":[{"version":1,"value":"$(/opt/scripts/generate_encryption_key_base64.sh 32)"}]}"'
Each key type is generated using a specialized script:
- Base64-encoded keys (
generate_encryption_key_base64.sh
):#!/bin/bash
/usr/bin/openssl rand "$1" -base64
This script:
- Uses OpenSSL to generate a random byte string of specified length
- Encodes the result in base64, giving a text representation
- Is used for keys that must be stored in text format
- Raw binary keys (
generate_encryption_key.sh
):#!/bin/bash
/usr/bin/openssl rand "$1"
This script:
- Generates a random byte string without encoding
- Is used for keys that will be processed internally in binary format
- RSA keys (
generate_rsa_encryption_key.sh
):#!/bin/bash
/usr/bin/openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:"$1"
This script:
- Generates an RSA key pair of specified length (e.g., 2048 bits)
- Returns the private key in PEM format
- Is used to generate keys for signing JWT tokens, asymmetric encryption, and other purposes
- Complex JSON objects with keys:
echo "{"primary":1,"keys":[{"version":1,"value":"$(/opt/scripts/generate_encryption_key_base64.sh 32)"}]}"
This construction:
- Creates a JSON object containing an encryption key
- Adds metadata such as primary key ID and version
- Is used by components that implement key rotation
Different key types are used for different purposes:
- Salt keys are used in hashing processes
- Symmetric keys are used for encrypting data at rest
- RSA keys are used for asymmetric encryption and token signing
- JSON objects with keys are used by components handling key rotation
Secure Credential Storage
All generated credentials are securely stored in Kubernetes Secrets and then saved in the vaconfig object for persistence between deployments:
credentials_save() {
secrets=$(kubectl get secret db-credentials -n prelude -o yaml | base64)
crdssecret=$(echo $secrets | tr -d 'n')
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/crdssecret", "value": "'"$crdssecret"'"}]'
}
This function:
- Gets the entire
db-credentials
secret in YAML format - Encodes it in base64
- Removes newline characters to get a uniform string
- Saves this string in the vaconfig object under the
/spec/crdssecret
path
During redeployment, credentials are restored from the vaconfig object:
credentials_load() {
tmpfile=$(mktemp)
kubectl get vaconfig prelude-vaconfig -o json | jq -r '.spec.crdssecret' | base64 -d > $tmpfile
if [ ! -s $tmpfile ]; then
rm -f $tmpfile
kubectl -n prelude create secret generic db-credentials --from-literal=postgres=change_me
else
kubectl apply -f $tmpfile
rm -f $tmpfile
fi
}
This function:
- Creates a temporary file
- Gets the encoded
crdssecret
value from the vaconfig object - Decodes it from base64 and writes to the file
- If the file is empty (no saved credentials), creates a basic secret
- Otherwise applies the saved secret
This mechanism ensures:
- Credential persistence between deployments
- Abstraction layer through Kubernetes Secrets usage
- Secure storage of sensitive data
SSL Certificate Management
The deploy.sh script implements a comprehensive SSL certificate management system, which is key to secure communication between components:
log_stage "Applying ingress certificate"
/opt/scripts/prepare_certs.sh
/opt/scripts/apply_certs.sh
The prepare_certs.sh
script generates a self-signed certificate if one doesn’t exist:
#!/bin/bash
set -e
# Generate self-signed certificate for ingress if such does not exist
vracli certificate ingress --list &>/dev/null || vracli certificate ingress --generate auto --set stdin
The apply_certs.sh
script installs certificates in the Kubernetes cluster:
#!/bin/bash
CERT_INGRESS_PEM=$(mktemp --suffix=ingress.pem)
CERT_INGRESS_KEY=$(mktemp --suffix=ingress.key)
CERT_FQDN_PEM=$(mktemp --suffix=fqdn.pem)
CERT_PROXY_PEM=$(mktemp --suffix=proxy.pem)
# ... removing existing secrets ...
vracli certificate ingress --list-key > $CERT_INGRESS_KEY
vracli certificate ingress --list > $CERT_INGRESS_PEM
kubectl create secret tls cert-ingress
--cert=${CERT_INGRESS_PEM}
--key=${CERT_INGRESS_KEY}
-n ingress || exit $?
rm -f ${CERT_INGRESS_KEY}
# ... handling load-balancer and proxy certificates ...
This script:
- Creates temporary files for certificates
- Gets the ingress certificate and its private key
- Creates a TLS secret in the ingress namespace
- Immediately removes the file with the private key
- Handles load-balancer and proxy certificates
Additionally, certificates are used during endpoint registration:
CERT="$(vracli certificate load-balancer --list || vracli certificate ingress --list)"
CERT_JSON=$(jq --null-input --compact-output --arg str "$CERT" '$str')
# ... later in code ...
-d '{"endpointProperties":{"hostName":"'$INGRESS_URL':443","dcId":"0","privateKeyId":"vcoadmin","privateKey":"vcoadmin","certificate":'"${CERT_JSON}"',"acceptSelfSignedCertificate":true,"vroAuthType":"CSP"}...
This fragment:
- Gets the load-balancer certificate, or if it doesn’t exist, uses the ingress certificate
- Converts it to JSON format
- Includes it in endpoint registration data
The certificate management system ensures:
- Automatic certificate generation if they don’t exist
- Secure certificate storage in Kubernetes secrets
- Immediate removal of sensitive data (private keys) after use
- Consistent certificate usage throughout the system
Sensitive Data Protection Mechanisms
The deploy.sh script implements several mechanisms ensuring protection of sensitive data:
Disabling Command Display – Before operations on sensitive data:
set +x
This instruction disables displaying executed commands, preventing password and key disclosure in logs.
Using Temporary Files with Automatic Removal – For operations requiring files:
tmpfile=$(mktemp)
# ... operations on file ...
rm -f $tmpfileThis pattern ensures that sensitive data is immediately removed after use.
Base64 Encoding – For storing data in Kubernetes objects:
value=$(“$@” | base64 -w 0)Base64 encoding protects against accidental sensitive data disclosure in logs and debugging.
Private Key Protection – After creating secrets with keys:
rm -f ${CERT_INGRESS_KEY}
Immediate removal of files with private keys minimizes their exposure risk.
Storing Secrets in Dedicated Kubernetes Objects:
kubectl -n prelude create secret generic db-credentials ...
Using Kubernetes Secrets provides an additional protection layer, including:
- Access control at RBAC level
- Optional encryption at rest
- Limited pod access
This entire comprehensive security and credential management system ensures:
- Strong, unique passwords and keys for each deployment
- Various cryptographic key types tailored to specific needs
- Secure credential storage and distribution
- Sensitive data protection throughout the lifecycle
- Automatic SSL certificate installation and configuration
Thanks to these mechanisms, the deploy.sh script establishes solid security foundations for the entire VMware Aria Automation platform, protecting both data and communication between components.
Component Architecture and Their Dependencies
The deploy.sh script manages a comprehensive ecosystem of cooperating components that together form the VMware Aria Automation platform. This section analyzes the architecture of these components, their roles, and the dependencies between them.
Infrastructure Components
The foundation of the platform are infrastructure components that provide basic services for other system elements:
1. Kubernetes – Container Orchestration Platform
Kubernetes forms the basis of the entire architecture, providing:
- Container orchestration (pods, deployments, statefulsets)
- Network management (services, ingress)
- Persistent data storage (persistent volumes)
- Configuration management (configmaps, secrets)
- Automatic recovery after failures
The deploy.sh script heavily uses the Kubernetes API:
kubectl create namespace "${NAMESPACE_PRELUDE}"
kubectl apply -f ...
kubectl patch vaconfig prelude-vaconfig ...
kubectl get pods ...
2. Helm – Package Manager for Kubernetes
Helm is used to define, install, and update Kubernetes components:
helm-upstall db-settings "" "${NAMESPACE_PRELUDE}"
helm-upstall identity "$VALUES" "$NAMESPACE_PRELUDE"
helm-toggle-state "$name" "$values" "$namespace" "$extra_values"
The helm-upstall
function is an advanced wrapper around helm upgrade/install
that ensures operation idempotency.
3. PostgreSQL – Database System
PostgreSQL serves as the data storage layer for most platform components. The script handles two configurations:
- Single-DB: One PostgreSQL instance serving all services
- Multi-DB: Dedicated instances for each service, providing better isolation and performance
function deploy_databases()
{
local multi_db="$1"
# ...
if [[ "$multi_db" == false ]]
then
databases=("postgres")
else
# Getting database list for each service
databases=$(kubectl get configmap db-settings -n ${namespace} -o json | jq -r ".data| keys[]"| grep -v "postgres" | grep -v "repmgr")
fi
# ...
}
Databases are deployed with replication for high availability, with automatic primary-node detection:
function get_primaries()
{
database_pods=$(kubectl get pods -n prelude -o custom-columns=:metadata.name,:spec.containers[0].image | grep db-image | cut -d " " -f1 | grep "0")
for pod in ${database_pods[@]}
do
pod_data=$(kubectl exec -n prelude ${pod} -- bash -c "chpst -u postgres repmgr node check --upstream 2>/dev/null")
# ... primary node detection ...
done
}
4. RabbitMQ – Message Queuing System
RabbitMQ provides asynchronous communication between components, which is key for microservice architecture:
helm-upstall rabbitmq-ha "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &
RabbitMQ configuration includes:
- Credential generation:
credential_add_from_command "rabbitmq" /opt/scripts/generate_pass.sh
credential_add_from_command "rabbitmqConfig" /opt/scripts/generate_rmq_config.sh "$(credential_get "rabbitmq")"
credential_add_from_command "rabbitmq-erlang-cookie" /opt/scripts/generate_pass.sh - Configuration as a high-availability cluster
- Definition of virtual hosts, users, and permissions
5. LDAP/vIDM – Identity Management Systems
The platform supports two identity management systems:
- vIDM (VMware Identity Manager) – preferred for production environments
- OpenLDAP – used in simpler deployments, not recommended for production
if output=$(vracli vidm); then
identity_profile=vidm
# ... vIDM configuration ...
elif ldap=$(kubectl get vaconfigs.prelude.vmware.com prelude-vaconfig -o json | jq -e .spec.ldap); then
echo "
#####################################################
# LDAP deployments are not meant for production use #
# and are not supported in HA environments! #
#####################################################
"
identity_profile=ldap
# ... LDAP configuration ...
fi
The choice of identity system affects many subsequent operations, such as endpoint registration and service configuration.
Service Components
Based on the infrastructure, numerous service components operate, providing platform functionality:
1. Ingress – Incoming Traffic Management
Ingress Controller manages incoming traffic to the platform:
k8s_create_namespace "${NAMESPACE_INGRESS}"
kubectl create secret tls cert-ingress --cert=${CERT_INGRESS_PEM} --key=${CERT_INGRESS_KEY} -n ingress
It provides:
- SSL/TLS termination
- Host-name and path-based routing
- Load balancing
2. Identity Service – Identity and Authorization Service
Identity Service is the central authentication and authorization point:
helm-upstall identity "$VALUES" "$NAMESPACE_PRELUDE" && wait_release identity
Its responsibilities include:
- User authentication
- Session management
- OAuth token handling
- vIDM or LDAP integration
3. Provisioning Service – Resource Provisioning Service
Provisioning Service manages endpoints and resources:
PROVISIONING_URL="http://provisioning-service.prelude.svc.cluster.local:8282"
# ... later in code ...
curl -k -f $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external
-H 'Content-Type: application/json'
-H 'Authorization: Bearer '$CSP_AUTH_TOKEN
# ...
It is responsible for:
- Endpoint registration and management
- Resource provisioning
- Communication with external systems
4. vRealize Orchestrator (vRO) – Orchestration Engine
vRO enables orchestration of complex processes:
/opt/scripts/register_vro_endpoint.sh
This component:
- Executes workflows
- Integrates with external systems
- Provides API for automation
5. Action Based Extensibility (ABX) – Extension Mechanism
ABX provides serverless computing functions for the platform:
if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
/opt/scripts/register_abx_endpoint.sh
fi
It’s based on OpenFaaS:
OPENFAAS_ADDRESS="http://gateway.openfaas.svc.cluster.local:8080"
And enables:
- Function execution in response to events
- Platform extension without modifying core code
- Integration with external systems
6. Adapter Host Service – Adapter Hosting Service
if [[ "$ENABLE_ADAPTER_HOST_SVC" == "true" ]]; then
VALUES="$VALUES,enableAdapterHostSvc=true"
else
VALUES="$VALUES,enableAdapterHostSvc=false"
fi
This service:
- Hosts integration adapters
- Serves as a bridge between the platform and external systems
- Isolates integration logic
Helper Components
In addition to main infrastructure and service components, the platform includes helper tools:
1. CSP (Cloud Services Platform) – Cloud Services Platform
CSP is used to manage identity, organizations, and services:
source /opt/scripts/csp_functions.sh
csp_auth "$admin_client_id" "$admin_client_secret"
csp_retrieve_orgs
This component:
- Manages OAuth clients
- Handles organizations and services
- Provides API for administrative operations
2. Liquibase – Database Schema Migration System
Liquibase automates database schema management:
log_stage "Clearing liquibase locks"
vracli reset liquibase --confirm
It provides:
- Schema versioning
- Controlled migrations
- Safe data structure updates
3. etcd – Cluster Configuration Storage System
etcd stores cluster and application configuration:
vracli proxy update-etcd
It is used for:
- Storing Kubernetes configuration
- Distributing settings between components
- Tracking service states
4. Health Check – System State Checking Mechanisms
The Health Check system monitors platform state:
wait_deploy_health() {
while true; do
/opt/health/run-once.sh deploy && break || sleep 5
done
}
It provides:
- Checking status of all components
- Problem detection
- System readiness tracking
Component Dependency Diagram
Dependencies between components can be represented as follows:
+-------------+
| |
| Kubernetes |
| |
+------+------+
|
+----------------+|+----------------+
| | |
+-------v------+ +------v-------+ +------v-------+
| | | | | |
| PostgreSQL | | RabbitMQ | | etcd |
| | | | | |
+-------+------+ +------+-------+ +------+-------+
| | |
| | |
+-------v------+ +------v-------+ +------v-------+
| | | | | |
| Identity | | CSP | | Health Check |
| Service | | | | |
+-------+------+ +------+-------+ +--------------+
| |
| |
+-------v------+ +------v-------+
| | | |
| Provisioning | | vRO |
| Service | | |
+-------+------+ +------+-------+
| |
| |
+-------v------+ +------v-------+
| | | |
| ABX | | Adapter Host |
| | | Service |
+--------------+ +--------------+
This architecture shows:
- The fundamental role of Kubernetes as the base platform
- The dependency of all components on infrastructure (PostgreSQL, RabbitMQ, etcd)
- The key role of Identity Service and CSP for other services
- The higher layers of specialized services (ABX, Adapter Host)
Communication Flow Between Components
Communication between components relies on several mechanisms:
- REST API – Used by most services:
“`bash
curl -k -f $PROVISIONING_URL/provisioning/mgmt/endpoints?enumerate&external
-H ‘Content-Type: application/json’ -H ‘Authorization: Bearer ‘$CSP_AUTH_TOKEN
…
2. **RabbitMQ Message Queues** - For asynchronous communication:
```bash
credential_add_from_command "rabbitmqConfig" /opt/scripts/generate_rmq_config.sh "$(credential_get "rabbitmq")"
Kubernetes Secrets – For credential distribution:
kubectl -n prelude create secret generic db-credentials ...
Kubernetes ConfigMaps – For configuration distribution:
“`bash
kubectl -n “$NAMESPACE_PRELUDE” create configmap identity-clients
–from-literal=clients=”${identity_managed_clients:2}”
5. **etcd** - For storing and sharing configuration:
```bash
vracli proxy update-etcd
This complex architecture of components and their dependencies is precisely managed by the deploy.sh script, which ensures proper deployment order, configuration, and integration of all elements. Thanks to this, the VMware Aria Automation platform operates as a cohesive, integrated system, despite its internal complexity and modular structure.
Architectural Patterns and Best Practices
The deploy.sh script implements numerous architectural patterns and best practices that ensure reliability, security, and flexibility of the deployment process. This section analyzes key patterns and practices used in the script, which constitute a valuable knowledge source for administrators and developers.
1. Idempotency
Idempotency is a property of operations that can be performed multiple times without changing the result after the first application. The deploy.sh script implements this pattern at many levels:
Idempotent Kubernetes Namespace Creation:
function k8s_create_namespace() {
local ns="$1"
if [[ $(kubectl get namespaces --no-headers | cut -f 1 -d ' ' | grep -x "$ns" | wc -l) == 0 ]]; then
kubectl create namespace "$ns"
fi
}
This function checks if a namespace already exists and creates it only if it’s missing, allowing multiple calls without errors.
Idempotent Helm Operations:
helm-upstall() {
# ... initialization code ...
/opt/scripts/helm-upstall --namespace="$3" --release-name="$release_name" --chart-path="$service_name" --set-string="$2" --set="$4" --timeout="$6" $5 || result=$?
# ... result handling code ...
}
The helm-upstall
function combines upgrade
and install
operations, ensuring that a chart will be installed if it doesn’t exist, or updated if it already exists.
Idempotent Credential Generation:
credential_add_from_command() {
local key=$1
shift
if [ "$1" == "--force" ]; then
shift
elif credential_exists "$key"; then
return 0
fi
# ... credential generation and addition ...
}
This function checks if a credential already exists and generates a new one only if it’s missing or forcing is enabled (the --force
option).
Benefits of idempotency:
- Ability to safely repeat deployment operations
- Resilience to interruptions and script restarts
- Ease of fixing partially completed deployments
- Reduction of errors during updates
2. Separation of Concerns
The deploy.sh script implements the separation of concerns pattern, dividing functionality into independent, specialized modules:
Using Specialized Helper Scripts:
source /opt/scripts/persistence_utils.sh
source /opt/scripts/db_utils.sh
/opt/scripts/prepare_certs.sh
/opt/scripts/apply_certs.sh
/opt/scripts/generate_credentials.sh
/opt/scripts/register_vro_endpoint.sh
Each script focuses on one specific task, which improves readability, simplifies maintenance, and enables code reuse.
Structural Logic Organization:
log_stage "Creating kubernetes namespaces"
# ... namespace creation ...
log_stage "Applying ingress certificate"
# ... certificate handling ...
log_stage "Deploying infrastructure services"
# ... service deployment ...
The script is organized into logical sections, each with a clear purpose, which facilitates understanding the flow and debugging.
Using Functions to Encapsulate Logic:
function k8s_delete_namespace() {
# ... complex namespace deletion logic ...
}
function backup_db_before_destroy() {
# ... database backup logic ...
}
Defining functions that encapsulate complex logic improves modularity, readability, and possibility of code reuse.
Benefits of separation of concerns:
- Easier understanding and code maintenance
- Ability to test and develop individual components independently
- Better dependency management
- Flexibility in adapting or replacing modules
3. Error Handling
The deploy.sh script implements layered and resilient error handling that ensures deployment process reliability:
Signal Traps:
trap on_exit EXIT
The on_exit
function is called when the script ends (regardless of the reason), ensuring proper cleanup and diagnostics even in case of failure.
Controlled Termination:
die() {
local msg=$1
local exit_code=$2
if [ $# -lt 2 ]; then
exit_code=1
fi
set +x
clear || true
echo $msg
exit $exit_code
}
The die
function ensures controlled termination with an informative message and appropriate exit code.
Selective Error Ignoring:
vracli ntp show-config || true
kubectl patch vaconfig prelude-vaconfig --type json -p '[...] || true
The || true
operator ensures that the script will continue even if certain commands end with an error.
Retry Mechanisms:
retry_backoff "5 15 45" "Failed to load existing vRO config" "load_existing_config"
The retry_backoff
function repeatedly tries to perform an operation with increasing delays, ensuring resilience to temporary problems.
Benefits of advanced error handling:
- Increased deployment process reliability
- Better user experience due to informative messages
- Automatic diagnostics and log package generation
- Resilience to temporary infrastructure problems
4. Automation
The deploy.sh script is an excellent example of complete automation of a complex deployment process:
Automatic Detection and Configuration:
if output=$(vracli vidm); then
identity_profile=vidm
# ... vIDM configuration ...
elif ldap=$(kubectl get vaconfigs.prelude.vmware.com prelude-vaconfig -o json | jq -e .spec.ldap); then
identity_profile=ldap
# ... LDAP configuration ...
fi
The script automatically detects the available identity system and adjusts further actions.
local database_directories=(/data/db/p-*)
if [[ -d "/data/db/live" ]]
then
multi_db_previous=false
elif [[ -d "${database_directories[0]}/live" ]]
then
multi_db_previous=true
fi
Automatically detects previous database configuration and initiates migration if needed.
Parallel Operation Execution:
helm-upstall endpoint-secrets "..." "$NAMESPACE_PRELUDE" &
helm-upstall no-license "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &
helm-upstall rabbitmq-ha "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR} &
The &
symbol runs processes in the background, allowing parallel execution of independent operations and significantly speeding up deployment.
echo "$allServices" | xargs -t -n 1 -P 0 -I % bash -c "helm-toggle-state % '$VALUES' '$NAMESPACE_PRELUDE' ..."
Using xargs -P 0
enables parallel operations for multiple services.
Benefits of automation:
- Elimination of human errors
- Significant deployment process acceleration
- Repeatability and consistency of deployments
- Possibility of integration with CI/CD systems
5. Configuration Flexibility
The deploy.sh script offers many mechanisms for adapting the deployment process to different needs:
Command-Line Parameters:
displayHelp() {
echo "Deploy or re-deploy all Prelude services"
echo ""
echo "Usage:"
echo "./deploy.sh [Options]"
echo ""
echo "Options:"
echo "-h --help Display this message."
echo "--deleteDatabases Delete postgres databases of all services."
# ... many other options ...
}
An extensive command-line option system allows detailed customization of the deployment process.
System Profiles:
for profile in "$PRELUDE_PROFILE_ROOT"/*; do
export PRELUDE_PROFILE_PATH="$profile"
profile_name="${profile##*/}"
if "$profile"/check; then
echo "Profile $profile_name: enabled" >&2
# ... profile application ...
fi
done
The system profile mechanism allows extending functionality without modifying the main script.
Feature Flags:
if [[ "$ENABLE_EXTENSIBILITY_SUPPORT" == "true" ]]; then
/opt/scripts/register_abx_endpoint.sh
fi
if [[ "$ENABLE_ANALYTICS" == "true" ]]; then
helm-upstall analytics-collector "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
helm-upstall analytics-service "$VALUES" "$NAMESPACE_PRELUDE" CHECK_DIR=${UPSTALL_STATUS_DIR}
fi
Feature flags allow enabling or disabling specific components and functionalities.
Benefits of configuration flexibility:
- Adapting deployment to different environments
- Enabling or disabling specific features
- Possibility of extending functionality without modifying the main script
- Ease of testing new components
6. Security
The deploy.sh script implements numerous security mechanisms:
Generating Unique, Strong Credentials:
credential_add_from_command "postgres" /opt/scripts/generate_pass.sh
credential_add_from_command "identity-encoder-salt" /opt/scripts/generate_encryption_key_base64.sh 32
credential_add_from_command "rsaKey" /opt/scripts/generate_rsa_encryption_key.sh 2048
Each component receives unique, cryptographically strong credentials.
Secure Sensitive Data Storage:
credentials_save() {
secrets=$(kubectl get secret db-credentials -n prelude -o yaml | base64)
crdssecret=$(echo $secrets | tr -d 'n')
kubectl patch vaconfig prelude-vaconfig --type json -p '[{"op": "add", "path": "/spec/crdssecret", "value": "'"$crdssecret"'"}]'
}
Credentials are stored in Kubernetes secrets.
Immediate Sensitive File Removal:
rm -f ${CERT_INGRESS_KEY}
rm -rf $SSH_DIR
Files containing private keys are removed immediately after use.
Sensitive Data Display Control:
set +x
# ... credential operations ...
set -x
Turning off debug mode during sensitive data operations.
Benefits of built-in security mechanisms:
- Sensitive data protection
- Unique credentials for each deployment
- Secure storage and distribution of secrets
- Minimizing data exposure risk
These architectural patterns and best practices make the deploy.sh script not only an effective deployment tool but also a valuable source of knowledge about advanced automation techniques, security, and complex system management. Many of these patterns can be applied to other projects, not necessarily related to VMware Aria Automation.
Summary
The deploy.sh script in VMware Aria Automation is an extremely advanced tool that serves as an excellent example of best practices in automation, DevOps, and infrastructure management. Its in-depth analysis reveals how modern, complex systems can be deployed in a reliable, secure, and flexible manner.
A deeper understanding of the deploy.sh script operation not only helps more efficiently manage the VMware Aria Automation environment but also constitutes a valuable lesson in advanced automation techniques, complex system management, and DevOps best practice implementation. Many patterns and approaches used in it can be adapted to other projects and systems, especially those based on containerization and microservices.
In a world where IT infrastructure complexity constantly grows, tools like deploy.sh become not just useful but essential for ensuring reliability, security, and operational efficiency of modern technology platforms.