Migrating a live, running system into ram and running from it is great for executing system modifications, that would otherwise require a physical access / kvm / console in case of a vm, remotely. This is based on following stack exchange discussions:
https://askubuntu.com/questions/1416758/remote-full-system-backup-of-a-running-system
as well as my own experiences.
- The target system is based on systemd init system. It can work with other init systems, however a specific approach is needed in such cases.
- You are accessing the system via SSH connection.
- The system doesn't have too unusual networking set up, since that could complicate matters.
- You are able to easily log in as root, or gain root privileges, ideally without invoking
sudo
. - All commands are executed as root.
https://www.reddit.com/r/linux/comments/azsbt2/comment/ei9qwaz/ - reinstall the os via a VM
Make sure no one else is using it and nothing else important is going on. It's probably a good idea to stop service-providing units like httpd
or ftpd
, just to ensure external connections don't disrupt things in the middle.
systemctl stop httpd
systemctl stop nfs-server
# and so on....
You might want to shut down as many services as possible by going into runlevel3
or lower.
CAUTION: Low runlevels might disable network managers or shutdown the system!
#see available targets:
systemctl list-units --type=target
systemctl isolate multi-user.target
#lowest runlevel possible, ssh session should stay connected but sshd and networking services will be stoped
#systemctl isolate rescue.target
Optional: Save the list of mounted file systems (reference) to a file mounted_fs:
df -TH > mounted_fs
Then, try to stop stoppable running services (excluding SSH):
systemctl list-units --type=service --state=running --no-pager --no-legend | awk '!/ssh/ {print $1}' | xargs systemctl stop
umount -a
This will print a number of 'Target is busy' warnings, for the root volume itself and for various temporary/system FSs. These can be ignored for the moment. What's important is that no on-disk filesystems remain mounted, except the root filesystem itself. Verify this:
# mount alone provides the info, but column makes it possible to read
mount | column -t
If you see any on-disk filesystems still mounted, then something is still running that shouldn't be. Check what it is using fuser
:
# if necessary:
yum install psmisc
# then:
fuser -vm <mountpoint>
systemctl stop <whatever>
umount -a
# repeat as required...
Note: if /tmp
is a directory on /
, we will not be able to unmount /
later in this procedure if we use /tmp/tmproot
. Thus it may be necessary to use an alternative mountpoint such as /tmproot
instead.
Edit: 22.06.24 - Modified to use /tmproot by default
mkdir /tmproot
mount -t tmpfs none /tmproot
mkdir /tmproot/{proc,sys,dev,run,usr,var,tmp,oldroot}
cp -ax /{bin,etc,mnt,sbin,lib,lib64,root} /tmproot/
cp -ax /usr/{bin,sbin,lib,lib64,libexec} /tmproot/usr/
# For clean systems, copying everything is okay
#cp -ax /var/{account,empty,lib,local,lock,nis,opt,preserve,run,spool,tmp,yp} /tmproot/var/
# Docker has lots of huge an unnecessary files, so it's better to exclude it
rsync -aAXv --exclude='lib/docker/' /var/{account,empty,lib,local,lock,nis,opt,preserve,run,spool,tmp,yp} /tmproot/var/
This creates a very minimal root system, which breaks (among other things) manpage viewing (no /usr/share
), user-level customizations (no /root
or /home
) and so forth. This is intentional, as it constitutes encouragement not to stay in such a jury-rigged root system any longer than necessary.
At this point you should also ensure that all the necessary software is installed, as it will also assuredly break the package manager. Glance through all the steps, and make sure you have the necessary executables.
mount --make-rprivate / # necessary for pivot_root to work
pivot_root /tmproot /tmproot/oldroot
for i in dev proc sys run; do mount --move /oldroot/$i /$i; done
systemd causes mounts to allow subtree sharing by default (as with mount --make-shared
), and this causes pivot_root
to fail. Hence, we turn this off globally with mount --make-rprivate /
. System and temporary filesystems are moved wholesale into the new root. This is necessary to make it work at all; the sockets for communication with systemd, among other things, live in /run
, and so there's no way to make running processes close it.
systemctl restart sshd
systemctl status sshd
After restarting sshd, ensure that you can get in, by opening another terminal and connecting to the machine again via ssh. If you can't, fix the problem before moving on.
Once you've verified you can connect in again, exit the shell you're currently using and reconnect. This allows the remaining forked sshd to exit and ensures the new one isn't holding /oldroot.
fuser -vm /oldroot
This will print a list of processes still holding onto the old root directory. On my system, it looked like this:
USER PID ACCESS COMMAND
/oldroot: root kernel mount /oldroot
root 1 ...e. systemd
root 549 ...e. systemd-journal
root 563 ...e. lvmetad
root 581 f..e. systemd-udevd
root 700 F..e. auditd
root 723 ...e. NetworkManager
root 727 ...e. irqbalance
root 730 F..e. tuned
root 736 ...e. smartd
root 737 F..e. rsyslogd
root 741 ...e. abrtd
chrony 742 ...e. chronyd
root 743 ...e. abrt-watch-log
libstoragemgmt 745 ...e. lsmd
root 746 ...e. systemd-logind
dbus 747 ...e. dbus-daemon
root 753 ..ce. atd
root 754 ...e. crond
root 770 ...e. agetty
polkitd 782 ...e. polkitd
root 1682 F.ce. master
postfix 1714 ..ce. qmgr
postfix 12658 ..ce. pickup
You need to deal with each one of these processes before you can unmount /oldroot
. The brute-force approach is simply kill $PID for each, but this can break things. To do it more softly:
systemctl | grep running
This creates a list of running services. You should be able to correlate this with the list of processes holding /oldroot
, then issue systemctl restart
for each of them. Some services will refuse to come up in the temporary root and enter a failed state; these don't really matter for the moment.
If the root drive you want to resize is an LVM drive, you may also need to restart some other running services, even if they do not show up in the list created by fuser -vm /oldroot
. You might be unable to to resize an LVM drive under Step 7 because of this Error:
fsadm: Cannot proceed with mounted filesystem "/oldroot"
You can try systemctl restart systemd-udevd
and if that fails, you can find the leftover mounts with grep system /proc/*/mounts | column -t
(Suggestion): If the umount /oldroot
seemed successful, yet the filesystems residing on the LVMs are still regarded as "in use" or "busy", try rm -rf /oldroot
. You might also try restarting all running services listed by systemctl | grep running
as well as once again opening a new ssh session, and closing all the old ones. You also might want to run systemctl daemon-reexec
again.
- Look for processes that say mounts:none and try restarting these:
PATH BIN FSTYPE
/proc/16395/mounts:tmpfs /run/systemd/timesync tmpfs
/proc/16395/mounts:none /var/lib/systemd/timesync tmpfs
/proc/18485/mounts:tmpfs /run/systemd/inhibit tmpfs
/proc/18485/mounts:tmpfs /run/systemd/seats tmpfs
/proc/18485/mounts:tmpfs /run/systemd/sessions tmpfs
/proc/18485/mounts:tmpfs /run/systemd/shutdown tmpfs
/proc/18485/mounts:tmpfs /run/systemd/users tmpfs
/proc/18485/mounts:none /var/lib/systemd/linger tmpfs
Some processes can't be dealt with via simple systemctl restart
. For me these included auditd
(which doesn't like to be killed via systemctl, and so just wanted a kill -15
). These can be dealt with individually.
The last process you'll find, usually, is systemd itself. For this, run systemctl daemon-reexec
.
Once you're done, the table should look like this:
USER PID ACCESS COMMAND
/oldroot: root kernel mount /oldroot
umount /oldroot
At this point, you can carry out whatever manipulations you require. The original question needed a simple resize2fs
invocation, but you can do whatever you want here; one other use case is transferring the root filesystem from a simple partition to LVM/RAID/whatever.
Personal commands and suggestions:
It is possible to use lvresize
with supported filesystems such as EXT4:
lvresize --resizefs --size 200G /dev/mapper/vg_group-lv_name
lvresize
checks the filesystems before resizing automatically
filesystems such as ZFS and XFS generally don't support shrinking.
It's best to use rsync
to temporarily move the files off of such partitions, and then destroy and recreate them with smaller size.
mount <blockdev> /oldroot
mount --make-rprivate / # again
mkdir /oldroot/tmproot #if you have removed it before.
pivot_root /oldroot /oldroot/tmproot
for i in dev proc sys run; do mount --move /tmproot/$i /$i; done
This is a straightforward reversal of step 4.
Repeat steps 5 and 6, except using /tmp/tmproot in place of /oldroot.
Remember to also restart sshd, opening new session, and discarding the old sessions!
umount /tmproot
rmdir /tmproot
Since it's a tmpfs, at this point the temporary root dissolves into the ether, never to be seen again.
Mount filesystems again:
mount -a
At this point, you should also update /etc/fstab
and grub.cfg
in accordance with any adjustments you made during step 7.
systemctl | grep failed
systemctl restart <whatever>
Allow shared subtrees again:
mount --make-rshared /
Start the stopped service units - you can use this single command:
systemctl isolate default.target
And you're done with the main part.
You will likely want to reboot the system at this point anyway, to ensure system consistency and stability.
Depending on what you've done while running the system from RAM, you might want to take steps to ensure the system will boot correctly. Usually all that is required in case of using grub is:
update-grub
or
grub-mkconfig -o /boot/grub/grub.cfg