File Discovery
A scan is the process of indexing your filesystem to detect new media files and file changes/updates. Scans are essential for keeping your media libraries up-to-date in Stump. The scanner is a queueable process which will perform these scans for you.
Scans can be isolated to either the library level or the series level. There are no real differences between the two, except that a library scan will scan all series in the library, while a series scan will only scan the selected series.
How does it work?
When you start a scan, Stump will walk your filesystem to detect any new, updated, or otherwise changed series and media. It will then insert these changes into the database, which will make them available to you in the UI.
A high-level overview of the scan process is as follows:
Preliminary checks
Does the library exist on disk? If not, the scan will fail and adjust the library status accordingly.
Build the main task queue
Stump will recursively traverse the filesystem (from the library root) to detect:
- New and valid series
- Missing series
Both are handled immediately and in chunks. Any series which are not missing, including those which were created in the last sub-step, will be added to the main task queue as a WalkSeries
task.
Walk series
This task doesn’t do much itself aside from discovery operations. Stump will recursively traverse the filesystem (from the series root) to detect:
- New and valid media
- Updated media (based on last modified time on disk)
- Missing media
These discoveries are then queued separately to be handled in an isolated manner - they are added to the front of the main task queue.
This growing queue might seem like a problem, but it is actually a feature. In order to keep the scan process as efficient as possible, Stump will not compute all required tasks at the start. As libraries grow in size, this can be a significant performance improvement.
Handle media tasks
Stump will handle the media tasks in the queue, which includes:
- Adding new media to the database
- Updating media in the database
- Mark missing media in the database
All of which are done in chunks, to both provide speedup and to avoid resource exhaustion. The chunk size is planned to be configurable in the future.
New media
Stump will take a chunk of paths and process/build their DB representations in parallel.
Once all paths in the chunk have been processed, they are inserted into the database one by one. This decision was made to avoid situations where one bad file would kill the entire batch of inserts. However, it is trivial to change this behavior in the future if needed.
Updated media
The process for updated media is exactly the same, except that stump diffs the newly built media representation with what already exists. The result of this diff is then used to update the database.
Missing media
Stump issues a single UPDATE
query for the entire set of missing media. This is generally safe since there is just one text column to update.
Cleanup
Stump will perform some cleanup operations, such as:
- Inserting any runtime logs (errors, warnings, etc) into the database
- Updating the library’s last scan time TODO: This is not yet implemented
Optional Processing
You are able to enable or disable certain processing options in the scanner
Metadata
When the Process metadata
option is enabled, Stump will attempt to extract metadata from your media files. For example, ComicInfo.xml
or OPF
files in comic books and ebooks, respectively. This metadata is then stored in the database and can be used to search, filter, and categorize your media.
See the metadata guide for more information.
File Hashing
There are two different hashing options available in Stump. They serve different purposes and may be enabled or disabled independently:
- A generic hash used for file deduplication. This is useful for being able to identify duplicates and prevent clutter in your library.
- A KoReader-compatible hash, used exclusively for KoReader compatibility. See the KoReader integration guide for more information.
If you don’t care about either of these features, you can disable hashing entirely to save on processing time.
File Conversion
Stump supports converting RAR files to ZIP. You can enable this by checking the Convert RAR to ZIP
option. The important thing to note that unless you also enable the Delete RAR after conversion
option, the RAR files will remain on disk after conversion.
Ignore Rules
If you have files that you don’t want Stump to scan, you can define a glob pattern which will be used to filter out any candidate files which match the pattern.
Stump no longer supports .stumpignore
files. Instead, ignore rules can be set during library creation or in the Scanning
section of the library settings. There is no limit to the number of ignore rules you can set, so long as each is a valid glob.
Please note that in some scenarios, updating the ignore rules may not take the desired effect. For example, if you have a file which is already in the database but add an ignore rule for it, the file will not be removed. This is planned to be addressed in the future.
Scheduling scans
You can configure the scheduler to run scans at a specific interval. This is useful for keeping your media libraries up-to-date without having to manually run scans.
To configure the scheduler, navigate to /settings/jobs
, scroll to the Job Scheduling
section towards the top of the page, fill out your desired interval (in seconds), and click the Save scheduler changes
button.
For convenience, there are a few preset options you may select from the dropdown menu. These are:
- Every 6 hours (21600 seconds)
- Every 12 hours (43200 seconds)
- Every 24 hours (86400 seconds)
- Once a week (604800 seconds)
- Once a month (2592000 seconds)
In the future, this section of the UI will change to include scheduling options for more than just scans. However, for now, it is only for scans.