Ballarat's cultural institutions are sitting on tens of thousands of digitised images — and nobody is entirely sure how many of them are duplicates. The problem did not emerge overnight. It accumulated across roughly two decades of piecemeal scanning drives, grant-funded digitisation rounds and software migrations that never quite spoke to each other, leaving organisations from the Art Gallery of Ballarat on Lydiard Street North to the Ballarat Heritage Services unit holding multiple versions of the same photograph, map or archival plate under different file names and metadata tags.
The issue matters now because several of those organisations are in the middle of, or approaching, significant capital and systems upgrades. Ballarat Health Services has been navigating a multi-year infrastructure renewal conversation with the state government. Sovereign Hill, which holds one of the most commercially significant photographic archives in regional Victoria, received renewed tourism grant support in the 2025–26 Victorian budget cycle. Both trajectories are pushing institutions to audit and rationalise their digital holdings before committing to new content management systems — and what they are finding is messier than expected.
How the duplicates accumulated
The root cause is straightforward: Ballarat went through at least three distinct digitisation eras without a unified regional standard. The first wave came in the early 2000s, when the Ballarat Regional Genealogical Society and local library branches scanned physical records largely independently, producing low-resolution TIFFs catalogued under inconsistent naming conventions. A second round followed in the early 2010s, driven by federal and state heritage funding that required institutions to submit digitised assets to centralised repositories — but did not require them to deduplicate against their own existing holdings first. A third push, accelerating from around 2019 onward, saw organisations migrate to cloud-based collection management platforms, importing legacy files wholesale rather than auditing them.
The result, across institutions clustered in the CBD and inner precincts around Sturt Street and Dana Street, is what archivists describe as layered redundancy: the same 1890s goldfield photograph might exist as an original scan from 2003, a second scan from 2012 at higher resolution, a cropped derivative created for a web gallery, a watermarked version produced for commercial licensing, and a thumbnail generated automatically by a content management system during migration. Each version carries slightly different metadata. None are flagged as related to the others.
Why fixing it is harder than it sounds
Deduplication software can catch exact or near-exact pixel matches. It struggles with the scenario Ballarat's archivists actually face: images that are genuinely different files — different resolution, different crop, different colour profile — but represent the same underlying historical object. That requires human review, and human review requires time and money that recurrent operating budgets rarely accommodate.
The Ballarat Mechanics' Institute, which holds one of the oldest lending library collections in Victoria and has been digitising portions of its photographic and ephemera holdings since at least 2008, has been working through a manual review process tied to its broader collection management project. Progress is measured in hundreds of records per volunteer shift, against a backlog estimated internally at well over 20,000 items across image and document categories combined.
The practical stakes are not abstract. When an institution publishes a duplicate under a different identifier, it can appear in external search indexes and aggregators — including the National Library of Australia's Trove platform — as two separate objects. Researchers cite both, creating bibliographic confusion that compounds over years. Licensing departments issue rights clearances against the wrong master file. And when storage contracts are renegotiated, organisations pay to retain data they do not realise they already hold elsewhere in their own system.
Regional institutions watching this space should expect the next twelve months to bring clearer guidance. The Public Record Office Victoria has been developing updated digitisation standards, and any organisation seeking state heritage or tourism grant funding from the 2026–27 round is likely to face tighter requirements around collection hygiene and deduplication declarations before funds are released. For Ballarat's archivists, that deadline is functioning as the forcing mechanism that years of good intentions never quite provided.