Skip to content

grok-rs/file-identify

Repository files navigation

file-identify

Crates.io docs.rs CI License: MIT MSRV

A Rust library for identifying file types based on extensions, content, and shebangs.

Given a file (or pre-loaded file information), returns a set of standardized tags identifying what the file is. Supports 315+ file types with compile-time optimized lookups via PHF.

Rust port of the Python identify library.

Quick start

[dependencies]
file-identify = "0.3"
use file_identify::{tags_from_path, tags_from_filename};

// Full identification from a filesystem path
let tags = tags_from_path("src/main.rs").unwrap();
assert!(tags.contains("file"));
assert!(tags.contains("rust"));
assert!(tags.contains("text"));

// Filename-only identification (no I/O)
let tags = tags_from_filename("Dockerfile");
assert!(tags.contains("dockerfile"));

I/O-free identification

For use with mocked or virtual filesystems (e.g., in tests), tags_from_info accepts pre-loaded file data with no filesystem access:

use file_identify::{tags_from_info, FileInfo, FileKind};

let info = FileInfo {
    filename: "script.py",
    file_kind: FileKind::Regular,
    is_executable: false,
    content: Some(b"print('hello')"),
};
let tags = tags_from_info(&info);
assert!(tags.contains("python"));
assert!(tags.contains("text"));

The FileIdentifier builder works with both paths and FileInfo:

use file_identify::FileIdentifier;

let id = FileIdentifier::new()
    .skip_content_analysis()
    .skip_shebang_analysis();

let tags = id.identify("src/main.rs").unwrap();
// Or: id.identify_from(&info);

How it works

A call to tags_from_path does this:

  1. Checks the file type (file, symlink, directory, socket). If not a regular file, stop.
  2. Checks permissions and adds executable or non-executable.
  3. Matches the filename or extension. If recognized, adds tags (including text/binary) and stops.
  4. Reads the first 1KB to determine text or binary.
  5. For text executables, parses the shebang to identify the interpreter.

By design, recognized extensions skip file reads entirely.

CLI

The CLI is behind the cli feature to keep the library dependency-free of clap:

cargo install file-identify --features cli

file-identify src/main.rs
# ["file", "non-executable", "rust", "text"]

file-identify --filename-only Cargo.toml
# ["cargo", "text", "toml"]

Tag categories

Category Tags
Type file, directory, symlink, socket
Mode executable, non-executable
Encoding text, binary
Language python, rust, javascript, c++, ... (315+ types)

Minimum supported Rust version

The MSRV is 1.94.0 and is checked in CI.

License

MIT

About

Fast Rust library and CLI tool for file type identification and detection using extensions, content analysis, and shebang parsing

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages