====== LXC ====== ===== Installation ===== * https://wiki.debian.org/LXC ===== Networking ===== Options: * Create a bridge on the host (natted/routed), * VLAN + bridge setup description, * Use libvirt package for easy network setup Network resources: * http://manpages.ubuntu.com/manpages/precise/man5/lxc.conf.5.html * Working bridge config (french) http://blog.champs-libres.coop/sysadmin/2015/08/18/ip-failover-ovh-container-lxc.html * Nat and bridge howto - manual bridge adding may result in network loss after 30-60min https://stackoverflow.com/questions/25042542/how-do-i-connect-a-lxc-container-to-an-ip-alias?rq=1 * https://wiki.debian.org/LXC/SimpleBridge * http://wiki.libvirt.org/page/Networking#Bridged_networking_.28aka_.22shared_physical_device.22.29 * ARP debugging http://www.claudiokuenzler.com/blog/551/network-problem-lxc-same-subnet-as-host-in-vmware * ARP debugging http://www.microhowto.info/troubleshooting/troubleshooting_ethernet_bridging_on_linux.html ==== Bridge ==== Exposes the public IPs into a container. Setup for OVh / SoYouStart failover IPs The failover IPs can be mapped directly into containers. **A Virtual Mac Address must be assigned to the failover IP in ovh webmanager first!** Check for too many ARP requests from the container! OVH may block the IP. **The requests can be avoided by assigning Virtual MACs to all failover IPs:** tcpdump -varp **Do not setup the failover IPs in /etc/network/interfaces on host or in vs - only in the container config file!** **Move eth0 to br0 in /etc/network/interfaces - eth0 becomes "static" and is added as "bridge_ports eth0":** auto eth0 iface eth0 inet manual auto br0 iface br0 inet static address 1.2.3.4 netmask 255.255.255.0 network 1.2.3.0 broadcast 1.2.3.255 gateway 1.2.3.254 # at ovh the gateway is always the main ip's x.x.x.254 bridge_ports eth0 # add these devices to bridge (can be eth0 eth1 ..) bridge_stp off # no spanning tree protocol bridge_fd 0 bridge_maxwait 0 **Container config:** # comment the default empty network interface #lxc.network.type = empty # bridge setup: lxc.network.type = veth # lxc network type (an additional virtual bridge will be created) lxc.network.flags = up # start on vs boot lxc.network.link = br0 # host bridge iface lxc.network.ipv4 = fail.over.ip.x/24 # failover ip to use / 24 lxc.network.ipv4.gateway = MAIN.IP.GATEWAY.254 # gateway of the main ip is x.x.x.254 lxc.network.hwaddr = 02:00:00:3c:95:31 # IMPORTANT! the MAC address assigned to the failover IP in webmanager Inside VS /etc/network/interfaces: auto lo iface lo inet loopback auto eth0 iface eth0 inet manual #auto eth0 #iface eth0 inet dhcp **Setup bridge options in /etc/sysctl.conf:** net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 **reload:** sysctl -p /etc/sysctl.conf \\ ==== Simple Nat Bridge ==== Easy version without libvirt - work well at OVH and hetzner. **Add an additional bridge (keep eth0 as is) in /etc/network/interfaces** auto lxc-bridge iface lxc-bridge inet static bridge_ports none bridge_fd 0 bridge_maxwait 0 address 192.168.10.1 netmask 255.255.255.0 up iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE **activate forwarding temporary:** echo 1 > /proc/sys/net/ipv4/ip_forward **activate forwarding permanent:** Uncomment in /etc/sysctl.conf net.ipv4.ip_forward=1 Activate new settings: sysctl -p **firewall rules** # intern -> extern iptables -t nat -A POSTROUTING -s 192.168.10.10/24 -j SNAT --to-source 1.2.3.4 # ports extern -> intern - 1 rule for each $PORT iptables -t nat -A PREROUTING -d 1.2.3.4 -p tcp --dport $PORT -j DNAT --to 192.168.10.10:${PORT} **Container config:** lxc.network.type = veth lxc.network.flags = up lxc.network.link = lxc-bridge lxc.network.ipv4.gateway = 192.168.10.1 lxc.network.ipv4 = 192.168.10.10/24 * https://wiki.debian.org/LXC/SimpleBridge \\ ==== Nat Bridge via libvirt and IPV6 ==== **Deprecated! - if you not need DHCP, use the simple bridge method without libvirt/virsh!** apt-get install libvirt-bin virsh net-info default virsh net-start default **autostart:** virsh net-autostart default virsh net-info default **configure - do not use editor - its overwritten! - set static ips:** first set mcedit as default editor in /root/.bashrc: export EDITOR='mcedit' virsh net-edit default default a07016bb-2e96-2000-9e16-b93b12245329 **edit VS config, add:** lxc.network.type = veth lxc.network.flags = up lxc.network.link = virbr0 # lxc.network.hwaddr = CC:AA:FF:EE:00:01 # not required and not useful: inter-vs-connection breaks whene set lxc.network.ipv4 = 192.168.122.100/24 lxc.network.ipv4.gateway = auto # auto usually works, otherwise set main IP gateway (.254 at OVH) # ipv6: lxc.network.ipv6 = 1234:1234:1234:1234:0100/64 lxc.network.ipv6.gateway = auto **edit /etc/network/interfaces _inside_ VS - just set eth0 to manual:** # The loopback network interface auto lo iface lo inet loopback iface lo inet6 loopback ## IPv4: auto eth0 iface eth0 inet manual ## IPv6: iface eth0 inet6 static address 1234:1234:1234:1234::131 netmask 64 ** Check /etc/resolv.conf inside VS:** # for testing, add google's dns: nameserver 8.8.8.8 **make persistent:** virsh net-autostart default **restart network:** virsh net-destroy default virsh net-start default **if you remove this net, disable autostart** virsh net-autostart default --disable **activate forwarding temporary:** echo 1 > /proc/sys/net/ipv4/ip_forward **activate forwarding permanently:** Uncomment in /etc/sysctl.conf net.ipv4.ip_forward=1 Activate new settings: sysctl -p **iptables config ( 1.2.3.4 is pubilc ip in root):** iptables -t nat -I PREROUTING -p tcp -d 1.2.3.4 --dport 80 -j DNAT --to-destination 192.168.122.100:80 iptables -I FORWARD -m state -d 192.168.122.100/24 --state NEW,RELATED,ESTABLISHED -j ACCEPT **ip6tables config for IPv6:** echo "generic IPv6 setup.." ip6tables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT # NEW or not? NEW, ip6tables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT ip6tables -t filter -A INPUT -i lo -j ACCEPT ip6tables -t filter -A OUTPUT -o lo -j ACCEPT ip6tables -t filter -P FORWARD ACCEPT # ping ip6tables -t filter -A INPUT -p ipv6-icmp -j ACCEPT ip6tables -t filter -A OUTPUT -p ipv6-icmp -j ACCEPT echo "IPv6 setup FOREACH vserver.." ip -6 route add 1234:1234:1234:1234::100 dev virbr0 ip -6 neigh add proxy 1234:1234:1234:1234::100 dev eth0 ping6 -I virbr0 -c 5 1234:1234:1234:1234::100 \\ ===== IPv6 setup ===== .. is tricky! besides the settings in the previous section (virsh, iptables, interfaces,..) it may be required to setup: /etc/sysctl.conf - working setting # v2 - vs ipv6: net.ipv6.conf.all.autoconf = 0 net.ipv6.conf.default.autoconf = 0 net.ipv6.conf.eth0.autoconf = 0 net.ipv6.conf.default.accept_ra = 0 # accept_ra = 2 seems a bit weird for a boolean. It's a special value to allow IPv6 forwarding AND RA net.ipv6.conf.all.accept_ra = 2 net.ipv6.conf.eth0.accept_ra = 2 # bridging (tk) net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 # reuqired for ipv6 forwarding net.ipv6.conf.all.proxy_ndp = 1 #net.ipv6.conf.eth0.proxy_ndp = 1 #net.ipv6.conf.virbr0.proxy_ndp=1 net.ipv6.conf.default.forwarding = 1 radvd neighbouring deamon seems to be required, too: /etc/radvd.conf interface virbr0 { AdvReachableTime 300000; # MinRtrAdvInterval 60; # MaxRtrAdvInterval 120; MinRtrAdvInterval 3; MaxRtrAdvInterval 10; AdvSendAdvert on; AdvManagedFlag off; AdvOtherConfigFlag off; prefix ::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr on; }; } **Helpful commands - inside and outside of container:** ip -6 r ip neigh show ping6 container_ip ping6 world_from_container FIXME * often ipv6 starts to work, when you ping sometimes in <-> out -> the neighbouring needs to be build up and stales after a short time * include ping6 in firewall startup builds up the route * something about "proxy" is required - see ip6tables above \\ ===== Usage ===== ==== Create new container ==== for unprivileged containers, edit /etc/subuid and /etc/subgid first and add matching lines to /etc/lxc/default.conf - see below! http://wiki.fr33.info/doku.php/linux/virtualization/lxc?&#unprivileged_containers FIX: original keyserver is broken! add: --keyserver hkp://keyserver.ubuntu.com lxc-create -n debian8 -B btrfs -t debian -- -r jessie --keyserver hkp://keyserver.ubuntu.com or lxc-create -n websrv -t debian-wheezy -B btrfs --keyserver hkp://keyserver.ubuntu.com Start / Stop VS: lxc-start -n websrv lxc-stop -n websrv Enter VS: lxc-console -n websrv In Buster, use the lxc-download script: /usr/share/lxc/templates/lxc-download --list --no-validate| grep debian | grep amd64 lxc-create -t /usr/share/lxc/templates/lxc-download -n -- --no-validate -d debian -r buster -a amd64 ==== Clone container ==== Copy all data: lxc-clone --backingstore btrfs --orig vs1 --new vs2 Make a brtfs snapshot: lxc-clone --backingstore btrfs --orig vs1 --new vs2 --snapshot \\ ===== Mount external Dirs in Container ===== The recommended way is to add the mountpoint with a relative path in the VS config: lxc.mount.entry=/home/mountme home none bind,optional,relative,create=dir Under some cicumstances it does not work (in unprivileged containers), but this works: lxc.mount.entry = /home/test /home/vservers/stretch/rootfs/home/test none bind 0 0 Also check Permissions and Ownership. chown to the root ID inside the container. \\ ===== brtfs snapshots ===== the container must be stopped for a lxc-snapshot. use btrfs snapshot to backup running containers (mysql may get inconsitent) you need to create container with option -B btrfs!! lxc-create -B btrfs -n mycontainer -t ubuntu workaround to add btrfs snapshots after creating container mv /home/vservers/my-lxc-container/rootfs /home/vservers/my-lxc-container/rootfs.saved btrfs subvolume create /home/vservers/my-lxc-container/rootfs btrfs subvolume list /home/vservers # for unprivileged root container, check UID and GID of rootfs dir (here it is 100000): chown 100000:100000 /home/vservers/webdev/rootfs/ mv /home/vservers/my-lxc-container/rootfs.saved/* /home/vservers/my-lxc-container/rootfs/ lxc-snapshot -n webdev snapshot with comment echo "working my-lxc-container before ..." > snap-comment lxc-stop -n my-lxc-container lxc-snapshot -n my-lxc-container -c snap-comment rm snap-comment * https://uli-heller.github.io/blog/2013/06/09/lxc-snapshots/ ==== Copy container ==== To move the container to another machine, .. - take care of the user/group IDs: pack: tar --numeric-owner -czvf container.tar.gz ./* move.. unpack: tar --numeric-owner -xzvf container.tar.gz ./* \\ ===== Security ===== ==== Unprivileged containers ==== uids and gids are shifted to another scope. so root uid 0 becomes 100000 for example. inside the container this is not visible, but from outside you can see the uid 100000+. you can still run these containers as root. you only have to add root in /etc/subuid and /etc/subgid - than its the same as running the containers as user. for best security, each container should have its own uid/gid space, although it is unlikely to break out of one container and enter another. only available in > v1.0, not in debian squeeze :( **run unprivileged container as root:** add root to /etc/subuid and /etc/subgid, root:100000:65536 **vs config - map user ids:** put this in /etc/lxc/default.conf too! lxc.id_map = u 0 100000 65536 lxc.id_map = g 0 100000 65536 in buster it's called idmap: lxc.idmap = u 0 100000 65536 lxc.idmap = g 0 100000 65536 **shift uuids to another span:** Use this script: https://github.com/exaexa/chownmap /root/bin/chownmap 0 200000 65536 /home/vservers//rootfs/ create container - use download method for unprivileged. jessie is not available, so you can upgrade wheezy and fix systemd error :( FIX for download: Original keyserver is broken, add --keyserver hkp://keyserver.ubuntu.com lxc-create -B btrfs -t download -n websrv --keyserver hkp://keyserver.ubuntu.com # error no jessie: lxc-create -B btrfs -n websrv -t download -- -d debian -r jessie -a amd64 --keyserver hkp://keyserver.ubuntu.com # error not working with unprivileged LANG=C SUITE=jessie MIRROR=http://httpredir.debian.org/debian lxc-create -n websrv -B btrfs -t debian **Unprivileged related bugfixes** lxc-start: Permission denied - failed to create directory '/usr/lib/x86_64-linux-gnu/lxc/rootfs/lxc_putold' Is caused by wrong permissions of rootfs. Set: chown 100000:100000 /vservers//rootfs If you copy files from outside into the container, they have wrong uid/gid. if the file should belong to root, just run this from the root system: chown 100000:100000 /vservers//rootfs/ **Links:** * https://unix.stackexchange.com/questions/127554/building-unprivileged-userns-lxc-container-from-scratch-by-migrating-a-privil * https://www.stgraber.org/2014/01/01/lxc-1-0-security-features/ * https://blog.deimos.fr/2014/08/29/lxc-1-0-on-debian-wheezy/ * https://wiki.deimos.fr/LXC_:_Install_and_configure_the_Linux_Containers * https://unix.stackexchange.com/questions/177030/what-is-an-unprivileged-lxc-container \\ ===== Bugfixes ===== * If a container doesn't start, check if /dev /sys /proc exist * Check log files in /containerrootdir/ * Check network settings in config **Network - no outgoing connection from container:** * check resolv.conf * reboot: iptables may be messed up **Squeeze-LTS Containers are tricky:** * If they don't start, use /etc/ from a default container. * If you cannot login "no more processes in this runlevel" use /etc/inittab from default container. The habe a different tty setup: #5:23:respawn:/sbin/getty 38400 tty5 #6:23:respawn:/sbin/getty 38400 tty6 * If you get "TERM variable not set", set it, e.g. in /root/.bashrc: TERM=linux export TERM **Jessie Containers are tricky:** systemd prevents start. remove systemd or prevent install before upgrade to jessie: http://without-systemd.org/wiki/index.php/How_to_remove_systemd_from_a_Debian_jessie/sid_installation fix by adding this to container config - didnt always work, sometimes dbus errors appear when apt'ing # Custom container options lxc.mount.auto = cgroup:mixed lxc.mount.entry = tmpfs dev/shm tmpfs rw,nosuid,nodev,create=dir 0 0 lxc.mount.entry = tmpfs run tmpfs rw,nosuid,nodev,mode=755,create=dir 0 0 lxc.mount.entry = tmpfs run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,create=dir 0 0 # lxc.mount.entry = debugfs sys/kernel/debug debugfs rw,relatime 0 0 lxc.mount.entry = mqueue dev/mqueue mqueue rw,relatime,create=dir 0 0 # lxc.mount.entry = hugetlbfs dev/hugepages hugetlbfs rw,relatime,create=dir 0 0 Also make sure, that you have the following line in your /etc/lxc/lxc.conf: lxc.cgroup.use = @all * https://github.com/debops/ansible-lxc/issues/15 ** systemd cgroups fuckup** Could not find writable mount point for cgroup hierarchy 12 while trying to create cgroup 12 is a systemd hierarchy - if you remove systemd and switch to sysvinit-core, this might be leftover. FIXME: check all of systemd is gone (uninstall ii): dpkg -l *systemd* apt remove --purge *systemd* # without systemd apt/preferences.d/ must not be set /etc/pam.d/common-session - unset this line: session optional pam_cgfs.so -c freezer,memory,name=systemd Check, if 12 is still active: cat /proc/self/cgroup WORKAROUND: mcedit /etc/lxc/lxc.conf and remove lxc.cgroup.use = @all * this is helpful: https://github.com/lxc/lxc/issues/1279 * this is not: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769494 \\ **SSH Config** **root login** Debian Jessie default container has this option set, so you cannot login with password as root: PermitRootLogin without-password **pam login error** error: PAM: pam_open_session(): Cannot make/remove an entry for the specified session /etc/pam.d/sshd # session required pam_loginuid.so * http://gaijin-nippon.blogspot.de/2013/07/audit-on-lxc-host.html or run this inside the container: sed '/pam_loginuid.so/s/^/#/g' -i /etc/pam.d/* FIXME maybe insecure **apt-get update broken via ipv6** apt-get update -o Acquire::ForceIPv4=true permanent apt via ipv4: echo 'Acquire::ForceIPv4 "true";' | tee /etc/apt/apt.conf.d/99force-ipv4 **rsyslog error** TESTME rsyslog doesnt start on boot and errors in syslog: .. rsyslogd: imklog: cannot open kernel log(/proc/kmsg): Permission denied. .. rsyslogd-2145: activation of module imklog failed [try http://www.rsyslog.com/e/2145 ] Disable kernel logging in container /etc/rsyslog.conf: # $ModLoad imklog # provides kernel logging support \\ ===== Move a root system into container ===== you can disable many services: * udev: udev service (which is a hard dependency of systemd in Jessie) won't run in a container, but systemd recognized it * apparmor: mounts security-fs, would need to disable drop caps. bad idea * kmod * lm-sensors * dbus * kbd * hdparm * ... configuration changes * various sysctl.conf options do not work inside container * /etc/modules -loading do not work If you get errors like: INIT: Id "6" respawning too fast: disabled for 5 minutes disable the matching line in /etc/inittab: # 5:23:respawn:/sbin/getty 38400 tty5