LXC

Installation

Networking

Bridge

Exposes the public IPs into a container.

Setup for OVh / SoYouStart failover IPs

The failover IPs can be mapped directly into containers.

A Virtual Mac Address must be assigned to the failover IP in ovh webmanager first!

Check for too many ARP requests from the container! OVH may block the IP. The requests can be avoided by assigning Virtual MACs to all failover IPs:

tcpdump -varp

Do not setup the failover IPs in /etc/network/interfaces on host or in vs - only in the container config file!

Move eth0 to br0 in /etc/network/interfaces - eth0 becomes “static” and is added as “bridge_ports eth0”:

auto eth0
iface eth0 inet manual

auto br0
iface br0 inet static
      address 1.2.3.4
      netmask 255.255.255.0
      network 1.2.3.0
      broadcast 1.2.3.255
      gateway 1.2.3.254    # at ovh the gateway is always the main ip's x.x.x.254
      bridge_ports eth0    # add these devices to bridge (can be eth0 eth1 ..)
      bridge_stp off       # no spanning tree protocol
      bridge_fd 0
      bridge_maxwait 0

Container config:

# comment the default empty network interface
#lxc.network.type = empty
# bridge setup:
lxc.network.type = veth                           # lxc network type (an additional virtual bridge will be created)
lxc.network.flags = up                            # start on vs boot
lxc.network.link = br0                            # host bridge iface
lxc.network.ipv4 = fail.over.ip.x/24              # failover ip to use / 24
lxc.network.ipv4.gateway = MAIN.IP.GATEWAY.254    # gateway of the main ip is x.x.x.254
lxc.network.hwaddr = 02:00:00:3c:95:31            # IMPORTANT! the MAC address assigned to the failover IP in webmanager

Inside VS /etc/network/interfaces:

auto lo
iface lo inet loopback
auto eth0
iface eth0 inet manual
#auto eth0
#iface eth0 inet dhcp

Setup bridge options in /etc/sysctl.conf:

net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

reload:

sysctl -p /etc/sysctl.conf  


Simple Nat Bridge

Easy version without libvirt - work well at OVH and hetzner.

Add an additional bridge (keep eth0 as is) in /etc/network/interfaces

auto lxc-bridge
iface lxc-bridge inet static
      bridge_ports none
      bridge_fd 0
      bridge_maxwait 0
      address 192.168.10.1
      netmask 255.255.255.0
      up iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

activate forwarding temporary:

echo 1 > /proc/sys/net/ipv4/ip_forward

activate forwarding permanent:

Uncomment in /etc/sysctl.conf

net.ipv4.ip_forward=1

Activate new settings:

sysctl -p

firewall rules

# intern -> extern
iptables -t nat -A POSTROUTING -s 192.168.10.10/24 -j SNAT --to-source 1.2.3.4
# ports extern -> intern - 1 rule for each $PORT
iptables -t nat -A PREROUTING  -d 1.2.3.4 -p tcp --dport $PORT -j DNAT --to 192.168.10.10:${PORT}

Container config:

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxc-bridge
lxc.network.ipv4.gateway = 192.168.10.1
lxc.network.ipv4 = 192.168.10.10/24


Nat Bridge via libvirt and IPV6

Deprecated! - if you not need DHCP, use the simple bridge method without libvirt/virsh!

apt-get install libvirt-bin
virsh net-info default
virsh net-start default

autostart:

virsh net-autostart default
virsh net-info default

configure - do not use editor - its overwritten! - set static ips:

first set mcedit as default editor in /root/.bashrc:

export EDITOR='mcedit'
virsh net-edit default
<network>
  <name>default</name>
  <uuid>a07016bb-2e96-2000-9e16-b93b12245329</uuid>
  <forward mode='nat'/>
  <bridge name='virbr0' stp='on' delay='0' />
  <mac address='51:51:00:FC:01:E1'/>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254' />
      <host mac='CC:AA:FF:EE:00:01' name='websrv.somedomain.com' ip='192.168.122.100' />
    </dhcp>
  </ip>
  <!-- IPV6 :: -->
 <ip family='ipv6' address='1234:1234:1234:1234::2' prefix='64'>
 </ip>     
</network>

edit VS config, add:

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = virbr0
# lxc.network.hwaddr = CC:AA:FF:EE:00:01   # not required and not useful: inter-vs-connection breaks whene set
lxc.network.ipv4 = 192.168.122.100/24
lxc.network.ipv4.gateway = auto   # auto usually works, otherwise set main IP gateway (.254 at OVH)
# ipv6:
lxc.network.ipv6 = 1234:1234:1234:1234:0100/64
lxc.network.ipv6.gateway = auto

edit /etc/network/interfaces _inside_ VS - just set eth0 to manual:

# The loopback network interface
auto lo
iface lo inet loopback
iface lo inet6 loopback
## IPv4:
auto eth0
iface eth0 inet manual
## IPv6:
iface eth0 inet6 static
  address 1234:1234:1234:1234::131
  netmask 64

Check /etc/resolv.conf inside VS:

# for testing, add google's dns:
nameserver 8.8.8.8

make persistent:

virsh net-autostart default

restart network:

virsh net-destroy default
virsh net-start default

if you remove this net, disable autostart

virsh net-autostart default --disable

activate forwarding temporary:

echo 1 > /proc/sys/net/ipv4/ip_forward

activate forwarding permanently:

Uncomment in /etc/sysctl.conf

net.ipv4.ip_forward=1

Activate new settings:

sysctl -p

iptables config ( 1.2.3.4 is pubilc ip in root):

iptables -t nat -I PREROUTING -p tcp -d 1.2.3.4 --dport 80 -j DNAT --to-destination 192.168.122.100:80
iptables -I FORWARD -m state -d 192.168.122.100/24 --state NEW,RELATED,ESTABLISHED -j ACCEPT

ip6tables config for IPv6:

echo "generic IPv6 setup.."
ip6tables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# NEW or not? NEW,
ip6tables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT

ip6tables -t filter -A INPUT -i lo -j ACCEPT
ip6tables -t filter -A OUTPUT -o lo -j ACCEPT
ip6tables -t filter -P FORWARD ACCEPT
# ping
ip6tables -t filter -A INPUT -p ipv6-icmp -j ACCEPT
ip6tables -t filter -A OUTPUT -p ipv6-icmp -j ACCEPT
echo "IPv6 setup FOREACH vserver.."
ip -6 route add 1234:1234:1234:1234::100 dev virbr0
ip -6 neigh add proxy 1234:1234:1234:1234::100 dev eth0
ping6 -I virbr0 -c 5 1234:1234:1234:1234::100


IPv6 setup

.. is tricky!

besides the settings in the previous section (virsh, iptables, interfaces,..) it may be required to setup:

/etc/sysctl.conf - working setting

# v2 - vs ipv6:
net.ipv6.conf.all.autoconf = 0
net.ipv6.conf.default.autoconf = 0
net.ipv6.conf.eth0.autoconf = 0
net.ipv6.conf.default.accept_ra = 0

# accept_ra = 2 seems a bit weird for a boolean. It's a special value to allow IPv6 forwarding AND RA
net.ipv6.conf.all.accept_ra = 2
net.ipv6.conf.eth0.accept_ra = 2
# bridging (tk)
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

# reuqired for ipv6 forwarding
net.ipv6.conf.all.proxy_ndp = 1
#net.ipv6.conf.eth0.proxy_ndp = 1
#net.ipv6.conf.virbr0.proxy_ndp=1
net.ipv6.conf.default.forwarding = 1

radvd neighbouring deamon seems to be required, too: /etc/radvd.conf

interface virbr0
{
  AdvReachableTime 300000;
#    MinRtrAdvInterval 60;
#    MaxRtrAdvInterval 120;

  MinRtrAdvInterval 3;
  MaxRtrAdvInterval 10;

  AdvSendAdvert on;
  AdvManagedFlag off;
  AdvOtherConfigFlag off;
  
  prefix <IPv6 here>::/64
  {
    AdvOnLink on;
    AdvAutonomous on;
    AdvRouterAddr on;
  };
}

Helpful commands - inside and outside of container:

ip -6 r
ip neigh show
ping6 container_ip
ping6 world_from_container

FIXME

  • often ipv6 starts to work, when you ping sometimes in ↔ out → the neighbouring needs to be build up and stales after a short time
  • include ping6 in firewall startup builds up the route
  • something about “proxy” is required - see ip6tables above


Usage

Create new container

for unprivileged containers, edit /etc/subuid and /etc/subgid first and add matching lines to /etc/lxc/default.conf - see below! http://wiki.fr33.info/doku.php/linux/virtualization/lxc?&#unprivileged_containers

lxc-create -n debian8  -B btrfs -t debian -- -r jessie

or

lxc-create -n websrv -t debian-wheezy  -B btrfs

Start / Stop VS:

lxc-start -n websrv
lxc-stop -n websrv

Enter VS:

lxc-console -n websrv

Clone container

Copy all data:

lxc-clone --backingstore btrfs --orig vs1 --new vs2

Make a brtfs snapshot:

lxc-clone --backingstore btrfs --orig vs1 --new vs2 --snapshot


brtfs snapshots

the container must be stopped for a lxc-snapshot. use btrfs snapshot to backup running containers (mysql may get inconsitent)

you need to create container with option -B btrfs!!

lxc-create -B btrfs -n mycontainer -t ubuntu

workaround to add btrfs snapshots after creating container

mv /home/vservers/my-lxc-container/rootfs /home/vservers/my-lxc-container/rootfs.saved
btrfs subvolume create /home/vservers/my-lxc-container/rootfs  
btrfs subvolume list /home/vservers

# for unprivileged root container, check UID and GID of rootfs dir (here it is 100000):
chown 100000:100000 /home/vservers/webdev/rootfs/
mv /home/vservers/my-lxc-container/rootfs.saved/* /home/vservers/my-lxc-container/rootfs/
lxc-snapshot -n webdev

snapshot with comment

echo "working my-lxc-container before ..." > snap-comment
lxc-stop -n my-lxc-container
lxc-snapshot -n my-lxc-container -c snap-comment
rm snap-comment

Copy container

To move the container to another machine, .. - take care of the user/group IDs:

pack:

tar --numeric-owner -czvf container.tar.gz ./*

move..

unpack:

tar --numeric-owner -xzvf container.tar.gz ./*


Security

Unprivileged containers

uids and gids are shifted to another scope. so root uid 0 becomes 100000 for example. inside the container this is not visible, but from outside you can see the uid 100000+. you can still run these containers as root. you only have to add root in /etc/subuid and /etc/subgid - than its the same as running the containers as user.

for best security, each container should have its own uid/gid space, although it is unlikely to break out of one container and enter another.

only available in > v1.0, not in debian squeeze :(

run unprivileged container as root:

add root to /etc/subuid and /etc/subgid,

root:100000:65536

vs config - map user ids:

put this in /etc/lxc/default.conf too!

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

shift uuids to another span:

Use this script: https://github.com/exaexa/chownmap

/root/bin/chownmap 0 200000 65536 /home/vservers/<containername>/rootfs/

create container - use download method for unprivileged. jessie is not available, so you can upgrade wheezy and fix systemd error :(

lxc-create -B btrfs -t download -n websrv   
# error no jessie: 
lxc-create -B btrfs -n websrv -t download -- -d debian -r jessie -a amd64   
# error not working with unprivileged
LANG=C SUITE=jessie MIRROR=http://httpredir.debian.org/debian lxc-create -n websrv -B btrfs -t debian

Unprivileged related bugfixes

lxc-start: Permission denied - failed to create directory '/usr/lib/x86_64-linux-gnu/lxc/rootfs/lxc_putold'

Is caused by wrong permissions of rootfs. Set:

chown 100000:100000 /vservers/<containername>/rootfs

If you copy files from outside into the container, they have wrong uid/gid. if the file should belong to root, just run this from the root system:

chown 100000:100000 /vservers/<containername>/rootfs/<path to the file in container>

Links:


Bugfixes

  • If a container doesn't start, check if /dev /sys /proc exist
  • Check log files in /containerrootdir/
  • Check network settings in config

Network - no outgoing connection from container:

  • check resolv.conf
  • reboot: iptables may be messed up

Squeeze-LTS Containers are tricky:

  • If they don't start, use /etc/ from a default container.
  • If you cannot login “no more processes in this runlevel” use /etc/inittab from default container. The habe a different tty setup:
#5:23:respawn:/sbin/getty 38400 tty5
#6:23:respawn:/sbin/getty 38400 tty6
  • If you get “TERM variable not set”, set it, e.g. in /root/.bashrc:
TERM=linux
export TERM

Jessie Containers are tricky:

systemd prevents start.

remove systemd or prevent install before upgrade to jessie:

http://without-systemd.org/wiki/index.php/How_to_remove_systemd_from_a_Debian_jessie/sid_installation

fix by adding this to container config - didnt always work, sometimes dbus errors appear when apt'ing

# Custom container options
lxc.mount.auto = cgroup:mixed
lxc.mount.entry = tmpfs dev/shm tmpfs rw,nosuid,nodev,create=dir 0 0
lxc.mount.entry = tmpfs run tmpfs rw,nosuid,nodev,mode=755,create=dir 0 0
lxc.mount.entry = tmpfs run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,create=dir 0 0
# lxc.mount.entry = debugfs sys/kernel/debug debugfs rw,relatime 0 0
lxc.mount.entry = mqueue dev/mqueue mqueue rw,relatime,create=dir 0 0
# lxc.mount.entry = hugetlbfs dev/hugepages hugetlbfs rw,relatime,create=dir 0 0

Also make sure, that you have the following line in your /etc/lxc/lxc.conf:

lxc.cgroup.use = @all

systemd cgroups fuckup

Could not find writable mount point for cgroup hierarchy 12 while trying to create cgroup

12 is a systemd hierarchy - if you remove systemd and switch to sysvinit-core, this might be leftover.

FIXME:

check all of systemd is gone (uninstall ii):

dpkg -l *systemd*
apt remove --purge *systemd*    # without systemd apt/preferences.d/ must not be set

/etc/pam.d/common-session - unset this line:

session     optional    pam_cgfs.so -c freezer,memory,name=systemd

Check, if 12 is still active:

cat /proc/self/cgroup

WORKAROUND: mcedit /etc/lxc/lxc.conf and remove

lxc.cgroup.use = @all

* this is helpful: https://github.com/lxc/lxc/issues/1279 * this is not: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769494


SSH Config

root login Debian Jessie default container has this option set, so you cannot login with password as root:

PermitRootLogin without-password

pam login error

error: PAM: pam_open_session(): Cannot make/remove an entry for the specified session

/etc/pam.d/sshd

# session    required     pam_loginuid.so 

or run this inside the container:

sed '/pam_loginuid.so/s/^/#/g' -i /etc/pam.d/* 

FIXME maybe insecure

apt-get update broken via ipv6

apt-get update -o Acquire::ForceIPv4=true

permanent apt via ipv4:

echo 'Acquire::ForceIPv4 "true";' | tee /etc/apt/apt.conf.d/99force-ipv4

rsyslog error

TESTME

rsyslog doesnt start on boot and errors in syslog:

.. rsyslogd: imklog: cannot open kernel log(/proc/kmsg): Permission denied.
.. rsyslogd-2145: activation of module imklog failed [try http://www.rsyslog.com/e/2145 ]

Disable kernel logging in container /etc/rsyslog.conf:

# $ModLoad imklog   # provides kernel logging support


Move a root system into container

you can disable many services:

  • udev: udev service (which is a hard dependency of systemd in Jessie) won't run in a container, but systemd recognized it
  • apparmor: mounts security-fs, would need to disable drop caps. bad idea
  • kmod
  • lm-sensors
  • dbus
  • kbd
  • hdparm

configuration changes

  • various sysctl.conf options do not work inside container
  • /etc/modules -loading do not work

If you get errors like:

INIT: Id "6" respawning too fast: disabled for 5 minutes 

disable the matching line in /etc/inittab:

# 5:23:respawn:/sbin/getty 38400 tty5
 
Back to top
linux/virtualization/lxc.txt · Last modified: 2019/02/10 16:22 by tkilla