Proposal: uncompressed input size

Currently `input_size` is the size of the raw input, which can be either compressed or uncompressed. When scaling memory based on input size you probably only care about the uncompressed size. But gzip does store the uncompressed size, which we could read into a separate `uncompressed_jnput_size` variable. The uncompressed size is stored in the last 4 bytes, this seems to work for me:

```python
#!/usr/bin/env python3
import os
import sys

path = sys.argv[1]

with open(path, 'rb') as f:
    f.seek(-4, os.SEEK_END)
    size = int.from_bytes(f.read(4), 'little')
    print(size)
```

The uncompressed size also isn't always set properly:

```console
nate@pdp-11% gzip -l /home/nate/work/galaxy/test-data/1.bam
         compressed        uncompressed  ratio uncompressed_name
               3592                   0   0.0% /home/nate/work/galaxy/test-data/1.bam
```

So we should have a default... actual size, or actual size * some constant factor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: uncompressed input size #141

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: uncompressed input size #141

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions