Log mill - a tool for processing raw logs into clean, usable output.
A do-it-yourself scaffold for building custom log analysis tools. It provides the architecture, patterns, and built-in features like incremental processing with state persistence. You can reuse already developed components or implement specific logic for your use case.
A command-line TypeScript/Node.js application built from composable components:
- Parser extracts data from log entries
- Processor transforms, calculates, aggregates data
- Reporter presents processed data as a final report
Components can be combined freely as long as data formats are compatible. For example, reuse an existing parser and processor with a new PDF reporter instead of the built-in HTML reporter.
- Incremental processing with resume support
- Streaming (handles large files efficiently)
- Atomic state persistence (crash-safe)
- Compressed file support (
.gz) - Optional YAML configuration
- HTTP server log analysis (referring websites report)
- Syslog analysis (application activity over time)
Show options: node dist/index.js --help
Usage: log-mill [options]
Analyze log files and generate reports
Options:
-i, --input <path> input log file path (plain text or .gz)
-d, --output-dir <path> output directory for reports and state
-m, --mode <mode> analysis mode (http-access, syslog-apps)
-c, --config <path> config file path (for modes requiring it)
-h, --help display help for command
Use the pre-configured example files
node dist/index.js -i example/log/access.log -d output -m http-access -c example/config/http-access.config.yamlnode dist/index.js -i example/log/syslog.log -d output -m syslog-appsThis is the most basic use case. Log lines processed in previous run are skipped and report updated with new data.
Just point log-mill to a current log file and run as frequent as you wish.
If you want to automate it and logs are rotated, make sure to run log-mill before rotating.
Scenario
- Logs are rotated, we have files like
access.log access.log.1 access.log.2.gz ... access.log.10.gz - We want to generate a report combining data from all historical files, not only from current log.
- Plain text and compressed files are mixed.
Solution
- Run
log-millfor every log file separately, starting from the oldest, keeping the sameoutput-dir. - Incremental processing also works with multiple files, newer data is added to previous data.
- Compressed files are handled out-of-the-box (only gzip).
Example Bash script:
ls -tr "/var/log/apache/"access.log* | while IFS= read -r file; do
node dist/index.js -i "$file" -d output -m http-access -c my-website.config.yaml
done- Parser: parse webserver log in combined format
- Processor: calculate number of entries per day and collect external referrers
- Reporter: save report as HTML file
- Parser: parse system logs in Syslog format
- Processor: aggregate number of log entries per application and per day
- Reporter: save report as HTML file with interactive charts per application
- Each analysis mode is composed of 3 component types:
Parser,Processor,Reporterwired together inindex.ts. - The same component implementation can be used in multiple modes, as long as they are compatible.
Parserreturns the same data type asProcessoraccepts.Processorreturns the same data type asReporteraccepts.
- Components requiring additional configuration implement the
Configurableinterface. Methodconfigurewhich is automatically called, usesConfigDatacontaining data from the parsed configuration file. This is a YAML file provided as a CLI parameter. If several components need configuration, they use the same file.
To add an example mode with a new log format, following files are affected:
src/
├── index.ts # Register mode
├── parsers/
│ ├── formats/
│ │ └── new-log-format.ts # Implement LogFormat
│ └── example-parser.ts # Build ParsedRecord<ExampleRecord>
├── processors/
│ └── example/
│ └── processor.ts # Process ParsedRecord<ExampleRecord> → ExampleData
│ # Merge with previous state (incremental)
└── reporters/
└── example/
└── reporter.ts # Generate report from ExampleData
New data types:
ExampleRecord- Data extracted from a single log line.ExampleData- Aggregated and calculated data from multiple log lines. Must be JSON-serializable because it is persisted in JSON format between runs.
