/ˈtɒf.i/, a sticky TCP load balancer.
Configuration is done through a single TOML file:
# IP address to bind to.
bind_addr = "0.0.0.0"
# Socket to bind stats API to.
stats_bind_socket = "127.0.0.1:2340"
# Socket to bind TCP listeners for inter-proxy communications.
# Defaults to localhost only for security, as is ideal for single proxy deployments.
# Every proxy must be able to access both of these ports on every other proxy.
# Make sure these ports are not externally accessible!
#
# Note: These may be merged into fewer ports in future.
raft_healthchecks_logs_bind_socket="127.0.0.1:2341"
raft_stick_table_logs_bind_socket="127.0.0.1:2342"
raft_healthchecks_proposals_bind_socket="127.0.0.1:2343"
raft_stick_table_proposals_bind_socket="127.0.0.1:2344"
# Ports to listen on.
ports = [ 5280, 5281, 5282 ]
# How many seconds a stick table entry is valid for without activity
expiration = 10
# Every group must have a unique ID and it must stay the same throughout the
# life of the group.
[groups.0]
# `portmap` defines which hosts a port is forwarded to.
# This can be expressed as one IP address or as a port-ip mapping for finer
# control:
portmap = "127.0.0.1"
# Optional attributes used to drain or disable a servers
# default: false
drain = false
# default: false
disabled = false
[groups.1]
portmap = "172.31.100.20"
# A group can have any number of healthchecks, including zero.
[[groups.1.healthchecks]]
# Healthchecks are TCP by default.
# How many checks can fail before server group is drained or disabled, default: 3
fall = 3
# What to do to the *whole group* when a server fails its healthchecks.
# default: `Draining`, values: `Draining`, `Disabled`, `Up` (ignore healthcheck)
fall_action = "Draining"
# How many healthchecks must pass before server group is resumed, default: 5
rise = 5
# How often to run the healthcheck (seconds between execution), default: 5.
interval = 5
# Maximum seconds to wait for the check to complete before considering it a fail,
# default: half of interval.
timeout = 1
[[groups.1.healthchecks]]
interval = 5
# Example of a HTTP healthcheck.
# HTTP mode can be set by specifying at least one of the options starting with `http_`.
fall_action = "Disabled"
# The endpoint on the server to check health with, default: "/"
http_target = "/health"
# Any HTTP headers to attach to the request, default: {}
#http_headers = { Content-Type = "text/plain" }
# HTTP method to use, default: GET, values: GET, POST
http_method = "POST"
# HTTP request body, default: None
http_body = "Hello, World!"
# HTTP response code which indicates a healthy server default: 200
http_status = 201
[groups.2]
# `portmap` defines which hosts a port is forwarded to.
# This can be expressed as one IP address or as a port-ip mapping for finer
# control:
[groups.2.portmap]
5280 = "172.31.100.198:5280"
5281 = "172.31.100.20:3487"
5282 = "172.31.100.3:5282"
# List of all peers, including the peer itself.
# If left empty, the proxy will run on its own.
[[ peers ]]
# Hostname that can be used to communicate with the peer from other peers.
# Can also be IP address.
host = "hostname.domain"
# MAC address of a network interface on the peer.
# Used to assign peers their node ID, and for peers to identify which peer they are.
mac_address = "00:00:00:00:00::01"
[[ peers ]]
host = "hostname.domain"
mac_address = "00:00:00:00:00::02"There are also some settings that are set through environment variables:
| Environment Variable | Description | Default |
|---|---|---|
TCPFFEE_CONFIG_PATH |
Path to config file | /etc/tcpffee/tcpffee.toml |
TCPFFEE_PID_PATH |
Path to file with the PID of tcpffee | /run/tcpffee.pid |
TCPFFEE_HEALTHCHECK_SAVE_PATH |
Path to file to read from/write to the state of healthchecks | /var/lib/tcpffee/healthcheck_state |
| Environment Variable | Values |
|---|---|
TCPFFEE_KEY_TYPE |
IP, IP_WITH_FIRST_MESSAGE_SLICE |
TCPFFEE_KEY_FIRST_MESSAGE_SLICE_START |
Positive integer |
TCPFFEE_KEY_FIRST_MESSAGE_SLICE_END |
Positive integer greater than TCPFFEE_KEY_FIRST_MESSAGE_SLICE_START |
The proxy key is used to identify client machines and evenly distribute their load. By default, the key used is the client's IP address.
Be aware that this may not perfectly distribute load, due to NAT, when there are lots of IPv4 clients. Additionally, if the drain feature needs to be used with zero user interruption, this may result in a server never being fully drained. This could happen if client machines behind the same NAT have several partially overlapping and long running sessions/communications with the servers being proxied. An example of this oculd be a heartbeat to a license server.
To prevent this, the key type can be set to IP_WITH_FIRST_MESSAGE_SLICE.
TCPFFEE_KEY_FIRST_MESSAGE_SLICE_START and TCPFFEE_KEY_FIRST_MESSAGE_SLICE_END must also be set.
This key type creates the key by appending to the IP address a slice of the first message the
client program sends.
This can be useful to prevent load balancing issues with NATs, if the client program's first
message is always the same format and identifies itself with something unique (such as a username
or hostname).
The log level can be set using the RUST_LOG environment variable.
More information can be found in the
env_logger documentation.
A server group can be set to drain by setting drain to true in the
config or by having it fail a healthcheck.
The same can be done for disabling a server, although the groups.<id>.healthchecks[].fall_action
setting must be set to Disabled for this behaviour, as servers are drained by default.
Configuration can be reloaded by sending the SIGUSR1 signal to the process.
The PID of the process is saved to /run/tcpffee.pid by default, and can be overwritten by the
environment variable TCPFFEE_PID_PATH.
Reloading the configuration will not cause any connections to be dropped, assuming that it is
correctly configured.
If the syntax or schema is wrong, the program will log this and continue with the old
configuration.
Live reloading can be used to dynamically change the configuration, such as adding/removing servers.
tcpffee comes with a HTTP API for accessing statistics and information about the proxy. This can currently be accessed at:
STATS_BIND_SOCKET/api/v1/STATS_BIND_SOCKET/ui/v1/(for an interactive way to explore the API)STATS_BIND_SOCKET/spec/v1/(for the OpenAPI specification)
where STATS_BIND_SOCKET is the variable set in the proxy configuration. The default socket will only listen on localhost, be careful opening it up, as the entire contents of the stick table are available through the API.
-
performance
-
Thorough testing of healthchecks
-
Save stick table to disk at regular intervals
-
Test proxy smooth shutdown
-
Transparent proxying
-
Use raft to synchronise state between proxy instances
Implement generic raft storage for use in proxyImplement generic raft node for use in proxyImplement communication for raft nodeSynchornise stick tables using raft (replace postgresql)- Leader should reject stick table additions for entries that have not expired
Leader server holds elections on health of server groups to decide if they are healthyServers which falsely report server groups as healthy get shut down(they now have an endpoint to report being unhealthy instead)
-
Testing
-
Connection poolingDone! -
~~Live configuraion editing API/config hot reloading~~~ Config hot reloading done!
-
LoggingDone! -
Cache database queriesDone! -
Drain option as a config lineDone! -
HealthchecksDone!Automatic draining for failed healthchecksDone!Fix cache not updated when client redirected due previous direction to disabled serverFixed!Healthcheck status is restored on server restartDone!Ensure healthchecks are up to date on config reloadDone!
-
Add timeouts to healthchecks