Ballarat's cultural institutions are midway through a painstaking effort to strip thousands of duplicate images from their shared digital archives — a problem that built quietly over nearly two decades of piecemeal scanning projects, grant cycles that rewarded volume over accuracy, and software systems that never talked to each other properly.
The duplication issue matters now because regional councils and state-funded bodies are under growing pressure to demonstrate value from digital infrastructure spending. Victoria's Public Record Office has tightened compliance expectations for regional repositories since 2023, and institutions that cannot demonstrate clean, deduplicated collections risk losing access to future digitisation funding rounds. For Ballarat, where the gold-rush heritage identity is tied directly to photographic and archival holdings, the stakes are unusually high.
How the problem accumulated
The origins trace back to the early 2000s, when Ballarat's major cultural institutions — including the Art Gallery of Ballarat on Lydiard Street North and the Ballarat Heritage Services arm of the City of Ballarat — each began independent digitisation programs. Sovereign Hill, which draws roughly 500,000 visitors annually to its Bradshaw Street site, ran its own separate photographic cataloguing effort tied to tourism grant applications through Regional Development Victoria.
Each program used different metadata standards. Scans made for one grant application were often rescanned for another, with no central registry to flag that the image already existed. By the time the Museum of Australian Democracy at Eureka — known as MADE, on Stawell Street — opened its expanded digital gallery in 2015, there were at least three institutions in central Ballarat holding overlapping collections of images from the 1854 Eureka Stockade and surrounding goldfields period, none of them systematically cross-referenced.
The Federal Government's Trove platform, administered by the National Library of Australia, absorbed many of these contributions over successive years. That aggregation, useful as it was for public access, also amplified the duplication: the same glass-plate negative, scanned by two different institutions on two different occasions, could appear in Trove under different identifiers, different titles and different rights statements.
The audit and what it found
A formal audit commissioned through the Ballarat Regional Libraries network and delivered in the second half of 2024 counted more than 14,000 image records across the four main contributing institutions that were either exact duplicates or near-identical scans of the same physical object. That figure represented roughly 22 per cent of the combined digital holdings reviewed. The audit cost approximately $47,000, funded through a State Library Victoria community heritage grant.
The report identified three structural causes: grant conditions that measured outputs in raw file counts rather than unique objects; the absence of a shared identifier system across institutions before 2019; and staff turnover rates at regional galleries and libraries that meant institutional memory about what had already been scanned was regularly lost. Ballarat Central Library on Doveton Street had changed digitisation coordinators four times between 2008 and 2020, according to the audit's methodology section.
Deduplication work began in earnest in early 2025. It involves both automated matching software and manual review, because metadata inconsistencies mean automated tools alone cannot reliably identify near-duplicates without human confirmation. The process is expected to run through to at least mid-2027.
For anyone using these archives — researchers, heritage consultants, tourism operators preparing interpretive material along the Eureka Centre precinct on Stawell Street — the practical implication is that search results in local digital catalogues are currently being revised progressively as duplicates are resolved. Records may temporarily appear, disappear or be merged without notice as the work proceeds.
Institutions have been advised to note the audit period in any materials that cite digital collection sizes, since headline figures will continue to shift downward until deduplication is complete. The City of Ballarat's heritage team has flagged the revised, cleaner collection will form the foundation for a new unified catalogue planned for launch in 2028.