Skip to content
This repository was archived by the owner on Feb 8, 2026. It is now read-only.

alvierahman90/tcpffee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

100 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tcpffee

/ˈtɒf.i/, a sticky TCP load balancer.

Setup and Configuration

Configuration is done through a single TOML file:

# IP address to bind to.
bind_addr = "0.0.0.0"
# Socket to bind stats API to.
stats_bind_socket = "127.0.0.1:2340"
# Socket to bind TCP listeners for inter-proxy communications.
# Defaults to localhost only for security, as is ideal for single proxy deployments.
# Every proxy must be able to access both of these ports on every other proxy.
# Make sure these ports are not externally accessible!
#
# Note: These may be merged into fewer ports in future.
raft_healthchecks_logs_bind_socket="127.0.0.1:2341"
raft_stick_table_logs_bind_socket="127.0.0.1:2342"
raft_healthchecks_proposals_bind_socket="127.0.0.1:2343"
raft_stick_table_proposals_bind_socket="127.0.0.1:2344"
# Ports to listen on.
ports = [ 5280, 5281, 5282 ]
# How many seconds a stick table entry is valid for without activity
expiration = 10


# Every group must have a unique ID and it must stay the same throughout the
# life of the group.
[groups.0]
# `portmap` defines which hosts a port is forwarded to.
# This can be expressed as one IP address or as a port-ip mapping for finer
# control:
portmap = "127.0.0.1"

# Optional attributes used to drain or disable a servers
# default: false
drain = false
# default: false
disabled = false

[groups.1]
portmap = "172.31.100.20"

# A group can have any number of healthchecks, including zero.
[[groups.1.healthchecks]]
# Healthchecks are TCP by default.
# How many checks can fail before server group is drained or disabled, default: 3
fall = 3
# What to do to the *whole group* when a server fails its healthchecks.
# default: `Draining`, values: `Draining`, `Disabled`, `Up` (ignore healthcheck)
fall_action = "Draining"
# How many healthchecks must pass before server group is resumed, default: 5
rise = 5
# How often to run the healthcheck (seconds between execution), default: 5.
interval = 5
# Maximum seconds to wait for the check to complete before considering it a fail,
# default: half of interval.
timeout = 1

[[groups.1.healthchecks]]
interval = 5
# Example of a HTTP healthcheck.
# HTTP mode can be set by specifying at least one of the options starting with `http_`.
fall_action = "Disabled"
# The endpoint on the server to check health with, default: "/"
http_target = "/health"
# Any HTTP headers to attach to the request, default: {}
#http_headers = { Content-Type = "text/plain" }
# HTTP method to use, default: GET, values: GET, POST
http_method = "POST"
# HTTP request body, default: None
http_body = "Hello, World!"
# HTTP response code which indicates a healthy server default: 200
http_status = 201


[groups.2]
# `portmap` defines which hosts a port is forwarded to.
# This can be expressed as one IP address or as a port-ip mapping for finer
# control:
[groups.2.portmap]
5280 = "172.31.100.198:5280"
5281 = "172.31.100.20:3487"
5282 = "172.31.100.3:5282"

# List of all peers, including the peer itself.
# If left empty, the proxy will run on its own.
[[ peers ]]
# Hostname that can be used to communicate with the peer from other peers.
# Can also be IP address.
host = "hostname.domain"
# MAC address of a network interface on the peer.
# Used to assign peers their node ID, and for peers to identify which peer they are.
mac_address = "00:00:00:00:00::01"

[[ peers ]]
host = "hostname.domain"
mac_address = "00:00:00:00:00::02"

There are also some settings that are set through environment variables:

Environment Variable Description Default
TCPFFEE_CONFIG_PATH Path to config file /etc/tcpffee/tcpffee.toml
TCPFFEE_PID_PATH Path to file with the PID of tcpffee /run/tcpffee.pid
TCPFFEE_HEALTHCHECK_SAVE_PATH Path to file to read from/write to the state of healthchecks /var/lib/tcpffee/healthcheck_state

Proxy Key Types

Environment Variable Values
TCPFFEE_KEY_TYPE IP, IP_WITH_FIRST_MESSAGE_SLICE
TCPFFEE_KEY_FIRST_MESSAGE_SLICE_START Positive integer
TCPFFEE_KEY_FIRST_MESSAGE_SLICE_END Positive integer greater than TCPFFEE_KEY_FIRST_MESSAGE_SLICE_START

The proxy key is used to identify client machines and evenly distribute their load. By default, the key used is the client's IP address.

Be aware that this may not perfectly distribute load, due to NAT, when there are lots of IPv4 clients. Additionally, if the drain feature needs to be used with zero user interruption, this may result in a server never being fully drained. This could happen if client machines behind the same NAT have several partially overlapping and long running sessions/communications with the servers being proxied. An example of this oculd be a heartbeat to a license server.

To prevent this, the key type can be set to IP_WITH_FIRST_MESSAGE_SLICE. TCPFFEE_KEY_FIRST_MESSAGE_SLICE_START and TCPFFEE_KEY_FIRST_MESSAGE_SLICE_END must also be set. This key type creates the key by appending to the IP address a slice of the first message the client program sends. This can be useful to prevent load balancing issues with NATs, if the client program's first message is always the same format and identifies itself with something unique (such as a username or hostname).

Logging

The log level can be set using the RUST_LOG environment variable. More information can be found in the env_logger documentation.

Draining and Disabling Server Groups

A server group can be set to drain by setting drain to true in the config or by having it fail a healthcheck. The same can be done for disabling a server, although the groups.<id>.healthchecks[].fall_action setting must be set to Disabled for this behaviour, as servers are drained by default.

Reloading Configuration

Configuration can be reloaded by sending the SIGUSR1 signal to the process. The PID of the process is saved to /run/tcpffee.pid by default, and can be overwritten by the environment variable TCPFFEE_PID_PATH. Reloading the configuration will not cause any connections to be dropped, assuming that it is correctly configured. If the syntax or schema is wrong, the program will log this and continue with the old configuration.

Live reloading can be used to dynamically change the configuration, such as adding/removing servers.

Statistics API

tcpffee comes with a HTTP API for accessing statistics and information about the proxy. This can currently be accessed at:

  • STATS_BIND_SOCKET/api/v1/
  • STATS_BIND_SOCKET/ui/v1/ (for an interactive way to explore the API)
  • STATS_BIND_SOCKET/spec/v1/ (for the OpenAPI specification)

where STATS_BIND_SOCKET is the variable set in the proxy configuration. The default socket will only listen on localhost, be careful opening it up, as the entire contents of the stick table are available through the API.

TODO

  • performance

  • Thorough testing of healthchecks

  • Save stick table to disk at regular intervals

  • Test proxy smooth shutdown

  • Transparent proxying

  • Use raft to synchronise state between proxy instances

    • Implement generic raft storage for use in proxy
    • Implement generic raft node for use in proxy
    • Implement communication for raft node
    • Synchornise stick tables using raft (replace postgresql)
    • Leader should reject stick table additions for entries that have not expired
    • Leader server holds elections on health of server groups to decide if they are healthy
    • Servers which falsely report server groups as healthy get shut down (they now have an endpoint to report being unhealthy instead)
  • Testing

  • Connection pooling Done!

  • ~~Live configuraion editing API/config hot reloading~~~ Config hot reloading done!

  • Logging Done!

  • Cache database queries Done!

  • Drain option as a config line Done!

  • Healthchecks Done!

    • Automatic draining for failed healthchecks Done!
    • Fix cache not updated when client redirected due previous direction to disabled server Fixed!
    • Healthcheck status is restored on server restart Done!
    • Ensure healthchecks are up to date on config reload Done!
  • Add timeouts to healthchecks

About

/ˈtɒf.i/ is a sticky TCP load balancer

Resources

Stars

Watchers

Forks

Contributors

Languages