====== LXC ======
===== Installation =====
* https://wiki.debian.org/LXC
===== Networking =====
Options:
* Create a bridge on the host (natted/routed),
* VLAN + bridge setup description,
* Use libvirt package for easy network setup
Network resources:
* http://manpages.ubuntu.com/manpages/precise/man5/lxc.conf.5.html
* Working bridge config (french) http://blog.champs-libres.coop/sysadmin/2015/08/18/ip-failover-ovh-container-lxc.html
* Nat and bridge howto - manual bridge adding may result in network loss after 30-60min https://stackoverflow.com/questions/25042542/how-do-i-connect-a-lxc-container-to-an-ip-alias?rq=1
* https://wiki.debian.org/LXC/SimpleBridge
* http://wiki.libvirt.org/page/Networking#Bridged_networking_.28aka_.22shared_physical_device.22.29
* ARP debugging http://www.claudiokuenzler.com/blog/551/network-problem-lxc-same-subnet-as-host-in-vmware
* ARP debugging http://www.microhowto.info/troubleshooting/troubleshooting_ethernet_bridging_on_linux.html
==== Bridge ====
Exposes the public IPs into a container.
Setup for OVh / SoYouStart failover IPs
The failover IPs can be mapped directly into containers.
**A Virtual Mac Address must be assigned to the failover IP in ovh webmanager first!**
Check for too many ARP requests from the container! OVH may block the IP. **The requests can be avoided by assigning Virtual MACs to all failover IPs:**
tcpdump -varp
**Do not setup the failover IPs in /etc/network/interfaces on host or in vs - only in the container config file!**
**Move eth0 to br0 in /etc/network/interfaces - eth0 becomes "static" and is added as "bridge_ports eth0":**
auto eth0
iface eth0 inet manual
auto br0
iface br0 inet static
address 1.2.3.4
netmask 255.255.255.0
network 1.2.3.0
broadcast 1.2.3.255
gateway 1.2.3.254 # at ovh the gateway is always the main ip's x.x.x.254
bridge_ports eth0 # add these devices to bridge (can be eth0 eth1 ..)
bridge_stp off # no spanning tree protocol
bridge_fd 0
bridge_maxwait 0
**Container config:**
# comment the default empty network interface
#lxc.network.type = empty
# bridge setup:
lxc.network.type = veth # lxc network type (an additional virtual bridge will be created)
lxc.network.flags = up # start on vs boot
lxc.network.link = br0 # host bridge iface
lxc.network.ipv4 = fail.over.ip.x/24 # failover ip to use / 24
lxc.network.ipv4.gateway = MAIN.IP.GATEWAY.254 # gateway of the main ip is x.x.x.254
lxc.network.hwaddr = 02:00:00:3c:95:31 # IMPORTANT! the MAC address assigned to the failover IP in webmanager
Inside VS /etc/network/interfaces:
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet manual
#auto eth0
#iface eth0 inet dhcp
**Setup bridge options in /etc/sysctl.conf:**
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
**reload:**
sysctl -p /etc/sysctl.conf
\\
==== Simple Nat Bridge ====
Easy version without libvirt - work well at OVH and hetzner.
**Add an additional bridge (keep eth0 as is) in /etc/network/interfaces**
auto lxc-bridge
iface lxc-bridge inet static
bridge_ports none
bridge_fd 0
bridge_maxwait 0
address 192.168.10.1
netmask 255.255.255.0
up iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
**activate forwarding temporary:**
echo 1 > /proc/sys/net/ipv4/ip_forward
**activate forwarding permanent:**
Uncomment in /etc/sysctl.conf
net.ipv4.ip_forward=1
Activate new settings:
sysctl -p
**firewall rules**
# intern -> extern
iptables -t nat -A POSTROUTING -s 192.168.10.10/24 -j SNAT --to-source 1.2.3.4
# ports extern -> intern - 1 rule for each $PORT
iptables -t nat -A PREROUTING -d 1.2.3.4 -p tcp --dport $PORT -j DNAT --to 192.168.10.10:${PORT}
**Container config:**
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxc-bridge
lxc.network.ipv4.gateway = 192.168.10.1
lxc.network.ipv4 = 192.168.10.10/24
* https://wiki.debian.org/LXC/SimpleBridge
\\
==== Nat Bridge via libvirt and IPV6 ====
**Deprecated! - if you not need DHCP, use the simple bridge method without libvirt/virsh!**
apt-get install libvirt-bin
virsh net-info default
virsh net-start default
**autostart:**
virsh net-autostart default
virsh net-info default
**configure - do not use editor - its overwritten! - set static ips:**
first set mcedit as default editor in /root/.bashrc:
export EDITOR='mcedit'
virsh net-edit default
default
a07016bb-2e96-2000-9e16-b93b12245329
**edit VS config, add:**
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = virbr0
# lxc.network.hwaddr = CC:AA:FF:EE:00:01 # not required and not useful: inter-vs-connection breaks whene set
lxc.network.ipv4 = 192.168.122.100/24
lxc.network.ipv4.gateway = auto # auto usually works, otherwise set main IP gateway (.254 at OVH)
# ipv6:
lxc.network.ipv6 = 1234:1234:1234:1234:0100/64
lxc.network.ipv6.gateway = auto
**edit /etc/network/interfaces _inside_ VS - just set eth0 to manual:**
# The loopback network interface
auto lo
iface lo inet loopback
iface lo inet6 loopback
## IPv4:
auto eth0
iface eth0 inet manual
## IPv6:
iface eth0 inet6 static
address 1234:1234:1234:1234::131
netmask 64
** Check /etc/resolv.conf inside VS:**
# for testing, add google's dns:
nameserver 8.8.8.8
**make persistent:**
virsh net-autostart default
**restart network:**
virsh net-destroy default
virsh net-start default
**if you remove this net, disable autostart**
virsh net-autostart default --disable
**activate forwarding temporary:**
echo 1 > /proc/sys/net/ipv4/ip_forward
**activate forwarding permanently:**
Uncomment in /etc/sysctl.conf
net.ipv4.ip_forward=1
Activate new settings:
sysctl -p
**iptables config ( 1.2.3.4 is pubilc ip in root):**
iptables -t nat -I PREROUTING -p tcp -d 1.2.3.4 --dport 80 -j DNAT --to-destination 192.168.122.100:80
iptables -I FORWARD -m state -d 192.168.122.100/24 --state NEW,RELATED,ESTABLISHED -j ACCEPT
**ip6tables config for IPv6:**
echo "generic IPv6 setup.."
ip6tables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# NEW or not? NEW,
ip6tables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
ip6tables -t filter -A INPUT -i lo -j ACCEPT
ip6tables -t filter -A OUTPUT -o lo -j ACCEPT
ip6tables -t filter -P FORWARD ACCEPT
# ping
ip6tables -t filter -A INPUT -p ipv6-icmp -j ACCEPT
ip6tables -t filter -A OUTPUT -p ipv6-icmp -j ACCEPT
echo "IPv6 setup FOREACH vserver.."
ip -6 route add 1234:1234:1234:1234::100 dev virbr0
ip -6 neigh add proxy 1234:1234:1234:1234::100 dev eth0
ping6 -I virbr0 -c 5 1234:1234:1234:1234::100
\\
===== IPv6 setup =====
.. is tricky!
besides the settings in the previous section (virsh, iptables, interfaces,..) it may be required to setup:
/etc/sysctl.conf - working setting
# v2 - vs ipv6:
net.ipv6.conf.all.autoconf = 0
net.ipv6.conf.default.autoconf = 0
net.ipv6.conf.eth0.autoconf = 0
net.ipv6.conf.default.accept_ra = 0
# accept_ra = 2 seems a bit weird for a boolean. It's a special value to allow IPv6 forwarding AND RA
net.ipv6.conf.all.accept_ra = 2
net.ipv6.conf.eth0.accept_ra = 2
# bridging (tk)
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
# reuqired for ipv6 forwarding
net.ipv6.conf.all.proxy_ndp = 1
#net.ipv6.conf.eth0.proxy_ndp = 1
#net.ipv6.conf.virbr0.proxy_ndp=1
net.ipv6.conf.default.forwarding = 1
radvd neighbouring deamon seems to be required, too:
/etc/radvd.conf
interface virbr0
{
AdvReachableTime 300000;
# MinRtrAdvInterval 60;
# MaxRtrAdvInterval 120;
MinRtrAdvInterval 3;
MaxRtrAdvInterval 10;
AdvSendAdvert on;
AdvManagedFlag off;
AdvOtherConfigFlag off;
prefix ::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr on;
};
}
**Helpful commands - inside and outside of container:**
ip -6 r
ip neigh show
ping6 container_ip
ping6 world_from_container
FIXME
* often ipv6 starts to work, when you ping sometimes in <-> out -> the neighbouring needs to be build up and stales after a short time
* include ping6 in firewall startup builds up the route
* something about "proxy" is required - see ip6tables above
\\
===== Usage =====
==== Create new container ====
for unprivileged containers, edit /etc/subuid and /etc/subgid first and add matching lines to /etc/lxc/default.conf - see below!
http://wiki.fr33.info/doku.php/linux/virtualization/lxc?unprivileged_containers
FIX: original keyserver is broken! add: --keyserver hkp://keyserver.ubuntu.com
lxc-create -n debian8 -B btrfs -t debian -- -r jessie --keyserver hkp://keyserver.ubuntu.com
or
lxc-create -n websrv -t debian-wheezy -B btrfs --keyserver hkp://keyserver.ubuntu.com
Start / Stop VS:
lxc-start -n websrv
lxc-stop -n websrv
Enter VS:
lxc-console -n websrv
In Buster, use the lxc-download script:
/usr/share/lxc/templates/lxc-download --list --no-validate| grep debian | grep amd64
lxc-create -t /usr/share/lxc/templates/lxc-download -n -- --no-validate -d debian -r buster -a amd64
==== Clone container ====
Copy all data:
lxc-clone --backingstore btrfs --orig vs1 --new vs2
Make a brtfs snapshot:
lxc-clone --backingstore btrfs --orig vs1 --new vs2 --snapshot
\\
===== Mount external Dirs in Container =====
The recommended way is to add the mountpoint with a relative path in the VS config:
lxc.mount.entry=/home/mountme home none bind,optional,relative,create=dir
Under some cicumstances it does not work (in unprivileged containers), but this works:
lxc.mount.entry = /home/test /home/vservers/stretch/rootfs/home/test none bind 0 0
Also check Permissions and Ownership. chown to the root ID inside the container.
\\
===== brtfs snapshots =====
the container must be stopped for a lxc-snapshot. use btrfs snapshot to backup running containers (mysql may get inconsitent)
you need to create container with option -B btrfs!!
lxc-create -B btrfs -n mycontainer -t ubuntu
workaround to add btrfs snapshots after creating container
mv /home/vservers/my-lxc-container/rootfs /home/vservers/my-lxc-container/rootfs.saved
btrfs subvolume create /home/vservers/my-lxc-container/rootfs
btrfs subvolume list /home/vservers
# for unprivileged root container, check UID and GID of rootfs dir (here it is 100000):
chown 100000:100000 /home/vservers/webdev/rootfs/
mv /home/vservers/my-lxc-container/rootfs.saved/* /home/vservers/my-lxc-container/rootfs/
lxc-snapshot -n webdev
snapshot with comment
echo "working my-lxc-container before ..." > snap-comment
lxc-stop -n my-lxc-container
lxc-snapshot -n my-lxc-container -c snap-comment
rm snap-comment
* https://uli-heller.github.io/blog/2013/06/09/lxc-snapshots/
==== Copy container ====
To move the container to another machine, .. - take care of the user/group IDs:
pack:
tar --numeric-owner -czvf container.tar.gz ./*
move..
unpack:
tar --numeric-owner -xzvf container.tar.gz ./*
\\
===== Security =====
==== Unprivileged containers ====
uids and gids are shifted to another scope. so root uid 0 becomes 100000 for example. inside the container this is not visible, but from outside you can see the uid 100000+. you can still run these containers as root. you only have to add root in /etc/subuid and /etc/subgid - than its the same as running the containers as user.
for best security, each container should have its own uid/gid space, although it is unlikely to break out of one container and enter another.
only available in > v1.0, not in debian squeeze :(
**run unprivileged container as root:**
add root to /etc/subuid and /etc/subgid,
root:100000:65536
**vs config - map user ids:**
put this in /etc/lxc/default.conf too!
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536
in buster it's called idmap:
lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536
**shift uuids to another span:**
Use this script: https://github.com/exaexa/chownmap
/root/bin/chownmap 0 200000 65536 /home/vservers//rootfs/
create container - use download method for unprivileged. jessie is not available, so you can upgrade wheezy and fix systemd error :(
FIX for download: Original keyserver is broken, add --keyserver hkp://keyserver.ubuntu.com
lxc-create -B btrfs -t download -n websrv --keyserver hkp://keyserver.ubuntu.com
# error no jessie:
lxc-create -B btrfs -n websrv -t download -- -d debian -r jessie -a amd64 --keyserver hkp://keyserver.ubuntu.com
# error not working with unprivileged
LANG=C SUITE=jessie MIRROR=http://httpredir.debian.org/debian lxc-create -n websrv -B btrfs -t debian
**Unprivileged related bugfixes**
lxc-start: Permission denied - failed to create directory '/usr/lib/x86_64-linux-gnu/lxc/rootfs/lxc_putold'
Is caused by wrong permissions of rootfs. Set:
chown 100000:100000 /vservers//rootfs
If you copy files from outside into the container, they have wrong uid/gid. if the file should belong to root, just run this from the root system:
chown 100000:100000 /vservers//rootfs/
**Links:**
* https://unix.stackexchange.com/questions/127554/building-unprivileged-userns-lxc-container-from-scratch-by-migrating-a-privil
* https://www.stgraber.org/2014/01/01/lxc-1-0-security-features/
* https://blog.deimos.fr/2014/08/29/lxc-1-0-on-debian-wheezy/
* https://wiki.deimos.fr/LXC_:_Install_and_configure_the_Linux_Containers
* https://unix.stackexchange.com/questions/177030/what-is-an-unprivileged-lxc-container
\\
===== Bugfixes =====
* If a container doesn't start, check if /dev /sys /proc exist
* Check log files in /containerrootdir/
* Check network settings in config
**Network - no outgoing connection from container:**
* check resolv.conf
* reboot: iptables may be messed up
**Squeeze-LTS Containers are tricky:**
* If they don't start, use /etc/ from a default container.
* If you cannot login "no more processes in this runlevel" use /etc/inittab from default container. The habe a different tty setup:
#5:23:respawn:/sbin/getty 38400 tty5
#6:23:respawn:/sbin/getty 38400 tty6
* If you get "TERM variable not set", set it, e.g. in /root/.bashrc:
TERM=linux
export TERM
**Jessie Containers are tricky:**
systemd prevents start.
remove systemd or prevent install before upgrade to jessie:
http://without-systemd.org/wiki/index.php/How_to_remove_systemd_from_a_Debian_jessie/sid_installation
fix by adding this to container config - didnt always work, sometimes dbus errors appear when apt'ing
# Custom container options
lxc.mount.auto = cgroup:mixed
lxc.mount.entry = tmpfs dev/shm tmpfs rw,nosuid,nodev,create=dir 0 0
lxc.mount.entry = tmpfs run tmpfs rw,nosuid,nodev,mode=755,create=dir 0 0
lxc.mount.entry = tmpfs run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,create=dir 0 0
# lxc.mount.entry = debugfs sys/kernel/debug debugfs rw,relatime 0 0
lxc.mount.entry = mqueue dev/mqueue mqueue rw,relatime,create=dir 0 0
# lxc.mount.entry = hugetlbfs dev/hugepages hugetlbfs rw,relatime,create=dir 0 0
Also make sure, that you have the following line in your /etc/lxc/lxc.conf:
lxc.cgroup.use = @all
* https://github.com/debops/ansible-lxc/issues/15
** systemd cgroups fuckup**
Could not find writable mount point for cgroup hierarchy 12 while trying to create cgroup
12 is a systemd hierarchy - if you remove systemd and switch to sysvinit-core, this might be leftover.
FIXME:
check all of systemd is gone (uninstall ii):
dpkg -l *systemd*
apt remove --purge *systemd* # without systemd apt/preferences.d/ must not be set
/etc/pam.d/common-session - unset this line:
session optional pam_cgfs.so -c freezer,memory,name=systemd
Check, if 12 is still active:
cat /proc/self/cgroup
WORKAROUND:
mcedit /etc/lxc/lxc.conf and remove
lxc.cgroup.use = @all
* this is helpful: https://github.com/lxc/lxc/issues/1279
* this is not: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769494
\\
**SSH Config**
**root login**
Debian Jessie default container has this option set, so you cannot login with password as root:
PermitRootLogin without-password
**pam login error**
error: PAM: pam_open_session(): Cannot make/remove an entry for the specified session
/etc/pam.d/sshd
# session required pam_loginuid.so
* http://gaijin-nippon.blogspot.de/2013/07/audit-on-lxc-host.html
or run this inside the container:
sed '/pam_loginuid.so/s/^/#/g' -i /etc/pam.d/*
FIXME maybe insecure
**apt-get update broken via ipv6**
apt-get update -o Acquire::ForceIPv4=true
permanent apt via ipv4:
echo 'Acquire::ForceIPv4 "true";' | tee /etc/apt/apt.conf.d/99force-ipv4
**rsyslog error**
TESTME
rsyslog doesnt start on boot and errors in syslog:
.. rsyslogd: imklog: cannot open kernel log(/proc/kmsg): Permission denied.
.. rsyslogd-2145: activation of module imklog failed [try http://www.rsyslog.com/e/2145 ]
Disable kernel logging in container /etc/rsyslog.conf:
# $ModLoad imklog # provides kernel logging support
\\
===== Move a root system into container =====
you can disable many services:
* udev: udev service (which is a hard dependency of systemd in Jessie) won't run in a container, but systemd recognized it
* apparmor: mounts security-fs, would need to disable drop caps. bad idea
* kmod
* lm-sensors
* dbus
* kbd
* hdparm
* ...
configuration changes
* various sysctl.conf options do not work inside container
* /etc/modules -loading do not work
If you get errors like:
INIT: Id "6" respawning too fast: disabled for 5 minutes
disable the matching line in /etc/inittab:
# 5:23:respawn:/sbin/getty 38400 tty5