As we know PCF Ops Manager is a singleton component and there is no High Availability. If it is corrupted or crashed for some reason, PCF platform level operations may be hampered - though it does not affect the ELR or running applications.
It is always advisable to test backup and recovery procedure of Ops Manager. I have done the same using these three ways
1) Export / Import via Ops Manager GUI
2) Using "CF OPS" (http://www.cfops.io/)
3) Image level VADP backup of Ops Manager VM (in case of VMWare)
In this test, assumption is that only Ops Manager is affected while BOSH Director and other PCF/ELR VMs are running fine.
1) Export / Import via Ops Manager GUI:
This is pretty much manual way of backing up Ops Manager where you export the settings via Ops Manager.
This exports base VM images and necessary packages and references to the installation IP addresses. Which means the export size can be large and export might take some time.
This will save a file from the browser window. Which includes the exported content.
Now let us deploy a new Ops Manager giving the same IP and hostname.
Once deployed, open a web browser and type FQDN of Ops Manager and choose to Import Existing Installation.
Provide decryption password (set during the first setup of Ops Manager), path of backup/exported installation.zip file and press Import.
Again this import process may take time depending on the size of data to be imported.
Once the import is completed, it will display the login screen. Login with username/password and you will see the message that import is successful.
While this method woks, it may not be straightforward to automate for regular backups. Now let's test second method using CF OPS.
2) "CF OPS" (http://www.cfops.io/)
This is an automation utility from Pivotal which can be downloaded from https://github.com/pivotalservices/cfops/releases and installed on a jumpbox.
Once done, backup can be triggered via a simple script like below and the same can be scheduled via cron or other scheduling tools.
Now assume that Ops Manager VM is corrupted/crashed/lost (for this testing I am just powering it off).
Now let us deploy a new Ops Manager giving the same IP and hostname.
Once deployed, open a web browser and type FQDN of Ops Manager. Choose internal authentication for this fresh setup.
This will bring up a fresh Ops Manager without any configuration/tiles.
DO NOT setup the authentication at this stage. Just go ahead with the restoration using CFOPS.
Here assumption is that original BOSH director is still operational.
If this is not the case, rename the bosh-state.json in order to force creation of the BOSH director. Removing bosh-state.json causes Ops Manager to treat the deploy as a new deployment, recreating missing Virtual Machines(VMs) including BOSH. The new deployment ignores existing VMs such as your Pivotal Cloud Foundry deployment.
Once restored, open the FQDN in a web browser which will not ask to setup authentication and directly prompt for id/password.
You will need to apply the changes and once done.... it's all good!
And during all these operations, Elastic Runtime and deployed apps were running perfectly fine.
3) Image level VADP backup of Ops Manager VM (in case of VMWare)
This is like backing up any standard VM via any enterprise backup software leveraging VADP api.
For this testing, I am using EMC Avamar.
Now assume that Ops Manager VM is corrupted/crashed/lost (for this testing I am actually deleting the VM).
And let us restore it from the Avamar backup.
Once restore is completed, power it on(if not selected to do so during restore) wait for couple of minutes for it to start and open the FQDN in a web browser.
It will prompt for decryption passphrase.
Once entered, it will prompt for login/password. That's it... its fully operational again.
I feel this is the simplest way of backing up and restoring Ops Manager.
Hope this will be useful.