Move docs into a renderable form for mkdocs. This necessitated a reordering of things inside this repo. Signed-off-by: Wachtl Enterprises LLC <tyrolyean@escpe.net>
2.3 KiB
% ITS Disaster recovery plan
About
This file describes how to perform disaster recovery if everything breaks down. As I cannot cover all of the catastrophic events that may occur, I will cover what comes to my mind why everything would stop working.
Scenario 1: Hypervisor dies
The current hypervisor (namely acraze.srv.it-syndikat.org) may spontaneously
die. First check wether the server is just hanging in unlock (it probably is).
If the server has died for real this time: that may happen at any time for any
reason. The most simple and probably
fastest recovery method is putting all hard disks inside a new similar server
and booting from those. If that is not an option, we have daily backups on
bringmethehorizon.cuco
(the server inside of cuco which is now virtualized).
You can of course restore from there at any point in time. The servers are all
connected to a single port on the firewall. Restoring should be a pretty fast
operation overall. Network configuration is handled entirely by the firewall.
Scenario 2: The firewall dies
The current firewall (namely sozial.asozial.it-syndikat.org) may spontaneously catch fire and die at any time. To recover from this event try booting the internal ssd on any hardware having the same or more network ports. The os will detect the interface changes and ask you to reassign them. If that is not an option I have copied a few router configurations inside this git repository inside the resources section. Install pfsense and restore this configuration on alternative hardware. This process should be fairly straightforward.
Scenario 3: The ldap server dies
This may happen for a very large amount of reasons. If you are relying on ldap for authorization on hosts, this may be disaster for you. If you have access to the vaultwarden (which does not rely on ldap) you can use the recovery root ssh key and ssh into the ldap machine (currently blacksunempire.srv.it-syndikat.org) and diagnose slapd. This may be resolved by simply restarting slapd.service, or less easily by actually debugging slapd, in which case you should probably contact someone who has knowledge of ldap. If you don't have access to the vaultwarden, contact someone who has. If you are reading this as a precautionary measure: register inside the vaultwarden and download the ssh key.