Ballarat City Council's digital image repository now holds more than 47,000 files catalogued under its cultural heritage collection — but an internal audit completed in late June found that somewhere between 8,000 and 11,000 of those files are functional duplicates, some appearing as many as four times under different file names or metadata tags. The finding has triggered a formal remediation project that council officers say will run through to at least March 2027.
The timing matters. Sovereign Hill is midway through a federally supported expansion of its digitised gold-rush interpretive materials, and the Ballarat Heritage Office on Mair Street has been negotiating with Museums Victoria over shared collection access agreements. Both projects depend on a clean, deduplicated master image library. Duplicate records do not merely waste storage — they break cross-referencing links, produce misleading search results for researchers, and in some cases have caused the same photograph to be licensed twice, creating potential intellectual property complications that council's legal team is now reviewing.
How a Decade of Good Intentions Produced a Messy Archive
The roots of the problem stretch back to 2014, when Ballarat City Council received a Victorian government Local History Digitisation grant — one of a series administered through Public Record Office Victoria — to scan physical photographs held at the Ballarat Library on Doveton Street North. The project was a genuine achievement: roughly 6,200 glass plates and photographic prints were scanned at archival resolution and loaded into the council's then-current content management system.
The trouble began almost immediately after that first grant closed. A second digitisation round in 2017, funded partly through the Regional Arts Victoria community arts partnership program, used a different contractor and a different file-naming convention. When those images were ingested into the same repository, the system did not recognise that several hundred images already existed in slightly different form. Both versions were retained.
Then came two software platform migrations — one in 2019 and a more substantial one in 2022, when council moved its broader records management onto a new enterprise system. Each migration carried legacy data forward without a deduplication pass first. By the time the 2022 migration was complete, three generations of file-naming conventions coexisted in a single database. Some images acquired during the Art Gallery of Ballarat's 2020 digitisation partnership — focused on works from the gallery's permanent collection on Lydiard Street North — were also absorbed into the general heritage repository rather than maintained as a discrete dataset, adding another layer of complexity.
What the Remediation Project Involves
Council engaged Canberra-based digital asset management consultancy Recordpoint Advisory in May 2026 to lead the deduplication work. The project scope, outlined in council's June 17 ordinary meeting agenda papers, involves three stages: automated hash-matching to identify exact duplicates, manual review of near-duplicates where metadata differs, and a final quality assurance pass against the original physical items held at the Ballarat Library and the Ballarat Heritage Office.
The cost, as reported in those agenda papers, is $214,000 — split between an existing IT capital works budget line and a contribution from the Sovereign Hill Museums Association, which has a direct interest in the outcome given its ongoing digitisation commitments. The association manages the Sovereign Hill outdoor museum on Bradshaw Street and has been building its own interpretive digital archive since 2021.
Researchers using the State Library of Victoria's Trove aggregator, which pulls from Ballarat's collection, have periodically flagged duplicate records through Trove's community correction tool. Those flags numbered 340 in the 2024-25 financial year, up from 91 the year before — a signal that the problem was growing, not stabilising.
Council officers say the automated deduplication phase should be complete by October 2026. Once the clean library is confirmed, the Heritage Office plans to release updated access terms for external researchers, including regional schools and genealogical societies that regularly draw on the Ballarat photographic record. Anyone currently working with downloaded images from the council's online portal is advised to check back after October, when file identifiers may change as part of the remediation process.