Ballarat's major cultural institutions are working through a significant housekeeping problem that built up over more than a decade: digital image collections riddled with duplicate files, some records appearing three or four times across separate cataloguing systems, eating up storage and quietly undermining the reliability of public-facing archives.
The issue matters now because several organisations — including the Art Gallery of Western Victoria on Lydiard Street North and the Ballarat Heritage Office within the City of Ballarat council — are mid-way through digitisation programs that were designed, in part, to make holdings permanently accessible online. Discovering that a meaningful share of what they uploaded was already there, just filed under a different filename or accession number, has forced a rethink of workflow and quality-assurance processes before those public portals go fully live.
How the Duplicates Accumulated
The problem did not arrive overnight. Between roughly 2008 and 2022, Ballarat institutions cycled through at least three major digital asset management platforms as technology standards shifted and grant funding came and went. Each migration — from one system to the next — carried the risk of copying files without removing the originals. Staff turnover compounded the issue; institutional knowledge about what had already been scanned and catalogued often walked out the door with departing employees.
Sovereign Hill, the open-air museum on Bradshaw Street whose collection documents the 1850s goldfields era, undertook a significant collection audit in 2023 as part of its broader strategic redevelopment. That process identified cataloguing inconsistencies that are common across regional collecting institutions nationally, not a failure unique to Ballarat. The Museum of Australian Democracy at Eureka — MADE — on Eureka Street faced comparable challenges after it took on additional photographic material from community donors during the COVID-19 closure period in 2020 and 2021, when volunteer cataloguers were working remotely without uniform file-naming conventions.
The practical consequences are not trivial. Storage costs money. Regional cultural institutions typically run on tight operating budgets, and paying cloud-storage fees for files that are functionally useless is a recurring line item that diverts resources from acquisition, conservation and public programming. A 2024 survey by the Australian Institute for the Conservation of Cultural Material found that duplicate digital files accounted for an average of 18 per cent of total storage volume across a sample of 40 regional collecting institutions — a figure that translated to measurable annual cost for organisations already stretched thin.
The Cleanup Process and What Comes Next
The standard remediation approach involves running deduplication software across entire repositories, then manually reviewing flagged matches before deletion — because automated tools can wrongly identify two genuinely distinct images as duplicates if metadata is incomplete. That manual review stage is labour-intensive. The City of Ballarat's library service, which holds the Local History Collection at the Ballarat Library on Camp Street, has been working through its photographic holdings in stages since mid-2024, prioritising the most heavily accessed parts of the collection first.
Ballarat Heritage Weekend, held each May, has historically generated a surge in public requests to access digitised historical photographs — particularly images of Sturt Street, the Ballarat Botanical Gardens, and the Lake Wendouree foreshore. Ensuring those high-demand records are clean, correctly described and without confusing duplicate entries is one practical driver pushing institutions to accelerate the work now rather than defer it.
For members of the public who use these digital archives — whether researching family history, preparing heritage applications to the council's Planning and Environment division, or sourcing imagery for community projects — the immediate advice from library and collections staff is straightforward: if a search returns what looks like the same image multiple times, flag it using the feedback mechanism on whichever portal you are using. Those reports feed directly into the review queue. The deduplication work will not finish this calendar year, but the institutions involved say the most disruptive phase of the project should be resolved before the 2027 digitisation funding round opens through Creative Victoria.