Configuration Overview¶
Lithos has 4 configs:
/etc/lithos/master.yaml
– global configuration for whole lithos daemon. Empty config should work most of the time. Master Config/etc/lithos/sandboxes/<NAME>.yaml
– the allowed paths and other system limits for every sandbox. You may think of a sandbox as a single application. Sandbox Config/etc/lithos/processes/<NAME>.yaml
– you may think of it as a list of pairs (image_name, num_of_processes_to_run). It’s only a tiny bit longer than that. Process Config<IMAGE>/config/<NAME>.yaml
– configuration of process to run. It’s where all the needed to run process are. It’s stored inside the image (so updated with new image), and limited by limits in sandbox config. Container Configuration
Four configs look superfluous, but they aren’t hard. Let’s see why are they needed…
Separation of Concerns¶
There are three roles which influence lithos containers.
Developer¶
Developer is the owner of the service. They needs to configure as much as possible for their container with the following limitations:
- They can’t break into host system including:
- remote code execution (RCE) vulnerabilities in the code
- even if malicious party controls source code, configuration and binary artifacts running inside the containers
- Service must run on any host without changes. This might include different filesystem layouts of the host system (i.e. different names/numbers of disks)
Config for developer, which we call container config comes within container itself. And usually defines command-line, environment and resource limits:
executable: /usr/bin/python3.6
arguments:
- myapp.py
- --port=8080
work-dir: /app
environ:
LANG: en_US.utf-8
memory-limit: 100Mi
fileno-limit: 1k
This is almost it. Sometimes container needs disk:
executable: /usr/bin/python3.6
arguments:
- myapp.py
- --port=8080
work-dir: /app
environ:
LANG: en_US.utf-8
volumes:
/var/lib/sqlite: !Persistent /db
memory-limit: 100Mi
fileno-limit: 1k
Note the following things:
- It doesn’t define where filesystem root is because config itself lies in the filesystem root.
- Volumes don’t specify path in the host filesystem, it’s a virtual path
(
/db
in this case). This is because otherwise the config would depend on exact filesystem layout on host system and in some cases it might be a vulnerability (or at least exposure of unnecessary data). Later we’ll describe how it’s mapped to the real filesystem.
Container operates inside a sandbox defined by platform maintainers.
Platform Maintainer¶
Platform maintainers define how containers are run. They define sandbox config and master config.
Former defines sandbox for a specific application. Let’s see an example (don’t use it in production, see below):
image-dir: /opt/app1-images
allow-users: [100000-165535]
allow-groups: [100000-165535]
default-user: 100000
default-group: 100000
This says that images of this application are in /opt/app1-images
and
it’s allowed to use user-ids in the range 100000-165535
.
User Namespaces¶
First thing to configure here is to make a user namespace per application:
image-dir: /opt/app1-images
uid-map:
- { inside: 0, outside: 10002, count: 2 }
gid-map:
- { inside: 0, outside: 10002, count: 2 }
allow-users: [1]
allow-groups: [1]
default-user: 1
default-group: 1
Note the following things:
- We introduced uid/gid map. this means that two users starting with user id
10002
in the host system will be two users0,1
in the container. - Allowed and default users are set relative to the container ids not host system ones
- We allow only single user id and group id in the container. And this is
number
1
(i.e. first non-root user) - This scheme works for 99% applications. But in case you need containers in containers or some other specific scenario you can enlarge uid-map and allowed groups as much as OS allows.
The id 10002
is arbitrary. You can use any one. For security and monitoring
purposes you should keep separate user ids for each app. Whether they are
same across the cluster or allocated on each node is irrelevant unless you
have shared filesystem between machines. Keeping them same uids across
cluster is still recommended for easier monitoring and debugging.
You can allow uid 0
too. When using uid name spaces it should not
cause any elevated privileges. But this allows creating mountpoints, spawning
other namespaces and do lots of things which creates larger vector of attack.
This has caused vulnerabilities due to kernel bugs in the past.
Filesystem¶
As you have already seen, sandbox config defines a place with container base directories:
image-dir: /opt/app1-images
image-dir-levels: 1 # default value
In this config, directories named like this /opt/app1-images/some-name1
serve as the root directory for containers(we’ll show later how to find out
which specific directory is used now). They are mounted readonly. With this
config:
image-dir: /opt/app2-images
image-dir-levels: 2
Images are located in /opt/app2-images/service1/version1
. I.e. two
directory components below the image dir. Arbitrary image-dir-levels
can be used. Only fixed number of components supported for each specific
sandbox, though.
Extra directories can be specified as follows:
readonly-paths:
/timezones: /usr/share/timezones
writable-paths:
/db: /var/lib/app1-database
There are virtual paths on the left. These can be mounted by referencing them in container config:
executable: /usr/bin/python3.6
arguments:
- myapp.py
- --port=8080
volumes:
/etc/timezones: !Readonly /timezones
/var/lib/sqlite: !Persistent /db
This allows platform maintainers to move directories around in the host system and map different directories on different systems without ever interfering the container.
Network¶
Sandbox also contains network configuration. By default all containers have host network (i.e. they operate in the same network namespace, just like non-containerized processes).
There is also support for bridged network:
bridged-network:
bridge: br0
network: 10.64.0.0/16
default-gateway: 10.64.255.254
after-setup-command: [/usr/bin/arping, -U, -c1, '@{container_ip}']
This enables network isolation for containers. Every container in the sandbox have its own network config with separate IP address (see below which one) but all of them derive their configuration from the sandbox config.
Different sandboxes may have the same or different bridged network configs.
See reference for more info.
Master Config¶
Along with sandbox configs, master config is also a part of the “platform maintainer” zone of responsibility. It contains things that are common for all containers and is usually the same across cluster.
You may run with empty config. But most commonly it’s expected to contain cgroup controllers for lithos to manage:
cgroup-controllers: [name,cpu,memory]
This is needed for lithos to correctly support memory limits and CPU quotes.
You might also want to nullify config-log-dir
if you don’t use
lithos_clean
:
cgroup-controllers: [name,cpu,memory]
config-log-dir: null
Usually, you don’t need to set anything else. There are various directories to configure in case you have non-standard filesystem layout. See reference for full list of settings.
Orchestration System¶
The last part of configuration is thing that ties sandboxes, images and container configs together. We call it process config.
The general idea is that this config is created by an orchestration system. I.e. system that decides where, which version and how many processes to run. This can be some real system like verwalter or just an ansible/chef/salt/bash script that writes required configs.
Basically it looks like:
web-worker:
kind: Daemon
image: web-wrk/d7399260
config: "/config/web-worker.yaml"
instances: 2
background-worker:
kind: Daemon
image: task-queue/d7399260
config: "/config/task-queue.yaml"
instances: 3
Here we run two kinds of services “web worker” with 2 instances (equal processes/containers) and “background worker” with 3 instances.
The image
is a directory path relative to image-dir
. Path must
contain the number of path components specified in image-dir-levels
.
It is also expected that the diretory is immutable, so each new version
of container is run from a different directory and directory path contains
some notion of the container version.
config
is the path inside the container. There is no limit on how many
configs might be in the same container. Not all of them might be running at
any moment in time.
There are few other things that can be configured in this config. If you’re using bridged networking, you need to specify IP address for each container:
web-worker:
kind: Daemon
image: web-wrk/d7399260
config: "/config/web-worker.yaml"
instances: 2
ip_addresses:
- 10.64.0.10
- 10.64.0.11
And sometimes containers allow to customize their config with variables:
background-worker:
kind: Daemon
image: task-queue/d7399260
config: "/config/task-queue.yaml"
instances: 3
variables:
queue_name: "main-queue"
See reference for more info.