/home/thomas

phd student and nlp researcher

Immich DB Repair

So, you've been experimenting with Immich. Maybe, you noticed that the specific way you like to organize photos didn't match what the Immich maintainers have dictated. You thought that "hey, I'm a developer. I'm sure the very strong recommendation not to mess with the file hierarchy created by Immich applies to other people". I get it! But now, you're paying the price, and you've just realized that Immich lacks a built-in way of repairing the file structure and aligning it in the DB. All you find are forum threads where people in your situation are berated and told to start their photo libraries from scratch.

Now your scrolling through your photos, noting that every other video doesn't load. You try to download the original photos from your latest trip to the Andes, but all you get is a server error. You are desperate. Re-uploading your photos doesn't work—Immich hashes the files and sees that they're already in the database. Immich-go is of no help, either. It also thinks you've already uploaded all of the files. Your blood pressure is rising.

Do not despair

I'm here to tell you that—believe it or not—repairing your Immich instance is absolutely doable. Regardless of what the nay-sayers online try to imply, the DB and file structure isn't that complicated. Essentially, we need to solve X problems:

  1. We need to evaluate all of the orphaned files in the DB.
  2. We need to find these files from somewhere else in our file system.
  3. We need to copy these originals to where Immich expects them.

Finding orphans

Assets in Immich are recorded in the asset table in the Immich DB. This table has a lot of columns but, for our purposes, we just care about each assets checksum, its ID, and the path where Immich thinks the original photo is stored. We also want to make sure that we're only looking at assets that have paths managed by Immich—so we need to ignore external libraries.

def iter_assets(conn):
    with conn.cursor() as cursor:
        cursor.execute('SELECT encode("checksum", \'hex\'), "id", "originalPath" FROM asset WHERE NOT "isExternal"')
        for checksum, asset_id, original_path in cursor:
            yield checksum, asset_id, original_path

Then, checking if these assets are orphaned is as easy of checking the asset paths. If there is a photo located at this path, then Immich can find it and all is good. On the other hand, if the path doesn't lead us to a file, then we have found an orphaned asset.

def find_orphans(conn, path_prefix: str | None, path_replace: str | None) -> List[Tuple[str, str, str | None]]:
    orphaned_assets: List[Tuple[str, str, str | None]] = []
    asset_iterator = tqdm(
        iter_assets(conn),
        total=get_asset_count(conn),
        desc="Scanning assets",
        unit="assets",
        dynamic_ncols=True,
    )

    for checksum, asset_id, original_path in asset_iterator:
        if not original_path:
            orphaned_assets.append((checksum, asset_id, original_path))
            continue

        mapped_path = remap_path(original_path, path_prefix, path_replace)
        if not os.path.exists(mapped_path):
            orphaned_assets.append((checksum, asset_id, original_path))

    return orphaned_assets

Now, we know which files have been lost and which assets we should search for.

Searching for files to adopt

Now, we still have a problem: we need to find the real, still-existing path of the orphaned assets. In my opinion, the only truly robust way to do so is to hash all possible media files and check if their hashes match any orphaned assets. Hashing is simple, and (at least in February 2026) is done with a simple SHA1 as implemented in hashlib.

def hash_file(path: str) -> str:
    hasher = hashlib.sha1()
    with open(path, "rb") as handle:
        for chunk in iter(lambda: handle.read(1024**2), b""):
            hasher.update(chunk)
    return hasher.hexdigest()

Next, we probably want to include some parallelism. Hashing is expensive and if the number of orphans—and your file-system shenanigans—is extensive, then we will benefit noticeably from hashing in parallel. Additionally, I thought it´d make sense to only consider files with extensions that matched an orphaned asset (available in the full script).

def hash_media_tree(root_paths: List[str], relevant_exts: set[str], workers: int) -> Dict[str, List[str]]:
    hash_total = count_hash_files(root_paths, relevant_exts)
    progress = tqdm(
        total=hash_total,
        desc="Hashing files",
        unit="files",
        dynamic_ncols=True,
    )

    checksum_map: Dict[str, List[str]] = {}
    queue: Queue[str | None] = Queue(maxsize=workers * 4)
    lock = threading.Lock()

    def producer() -> None:
        for root_path in root_paths:
            for dirpath, _, filenames in os.walk(root_path):
                for filename in filenames:
                    if "." not in filename:
                        continue
                    ext = filename.rsplit(".", 1)[-1].lower()
                    if ext not in relevant_exts:
                        continue
                    file_path = os.path.join(dirpath, filename)
                    queue.put(file_path)
        for _ in range(workers):
            queue.put(None)

    def worker() -> None:
        while True:
            path = queue.get()
            if path is None:
                return
            try:
                checksum = hash_file(path)
            except (OSError, PermissionError):
                continue
            with lock:
                checksum_map.setdefault(checksum, []).append(path)
                progress.update(1)

    producer_thread = threading.Thread(target=producer)
    producer_thread.start()

    worker_threads = [threading.Thread(target=worker) for _ in range(workers)]
    for t in worker_threads:
        t.start()

    producer_thread.join()
    for t in worker_threads:
        t.join()
    
    return checksum_mapum = hash_file(path)

A bit long for a single (well, 1 + 2) function, but creating the producer and worker functions with closures made sense to me at the time. Anyhow, this will give us a map of hashes and the files that produced them. Then, we can use this to find candidates to save our orphaned assets:

matches = [
    (checksum, asset_id, original_path, checksum_map[checksum])
    for checksum, asset_id, original_path in orphaned_assets
    if checksum in checksum_map
]

Restoring order

Finally, we can restore our orphans. Optionally, we can translate the path received from the Immich DB to a path that better matches our file system. This can be useful if Immich is running in Docker, with assets stored in a mounted volume, but the script is running from the utside.

def remap_path(path: str, prefix: str | None, replacement: str | None) -> str:
    if prefix and replacement and path.startswith(prefix):
        return replacement + path[len(prefix) :]
    return path

def restore_matches(
    matches: List[Tuple[str, str, str | None, List[str]]],
    path_prefix: str | None,
    path_replace: str | None,
) -> None:
    progress = tqdm(matches, unit="files", desc="Restoring matches")
    for _, asset_id, original_path, paths in progress:
        if not original_path:
            continue
        dest_path = remap_path(original_path, path_prefix, path_replace)
        if not dest_path:
            continue
        if os.path.exists(dest_path):
            continue
        os.makedirs(os.path.dirname(dest_path), exist_ok=True)
        source_path = paths[0]
        try:
            progress.set_postfix_str(os.path.basename(source_path))
            shutil.copy2(source_path, dest_path)
        except OSError as exc:
            raise RuntimeError(f"RESTORE FAILED {asset_id} {source_path} -> {dest_path}: {exc}")

That's it! Now we've managed to save our precious media files (and the effort of cataloguing them) without having to nuke our Immich instance.

The full script

If you're convinced that this script will fix your problems, and you understand that validating this is entirely your own responsibility, then feel free to run the script. You can find it through this link.

Written on February 21, 2026