Skip to main content
The Daily Ballarat

Ballarat news, every day

News

By the Numbers: Ballarat's Digital Archive Has a Duplicate Image Problem Nobody Wants to Talk About

Thousands of duplicate photographs are clogging the region's heritage databases, and the cost of cleaning them up is higher than anyone budgeted for.

How we report this

Our reporters are based in Ballarat and cover local government, business and community. We are independently owned and editorially independent. Read our editorial standards →

By Ballarat News Desk · Published 5 July 2026, 4:57 am · 4 min read ·

Updated 5 July 2026, 1:57 pm

By the Numbers: Ballarat's Digital Archive Has a Duplicate Image Problem Nobody Wants to Talk About
Photo: Photo by Shutter Speed on Pexels

More than 14,000 duplicate images are sitting inside Ballarat's regional heritage photo collections, inflating storage costs, confusing researchers and quietly undermining years of digitisation work funded by Victorian and federal grants. That figure — drawn from a preliminary audit completed in May 2026 by the Ballarat Regional Archives Centre on Doveton Street — marks the first time anyone has put a hard number on a problem archivists have complained about informally for nearly a decade.

The timing matters. Ballarat City Council is mid-way through a $2.3 million digitisation program tied to the Central Highlands Regional Partnership, a program that runs to June 2027. Duplicates don't just waste storage — they erode the integrity of public-facing databases like the Sovereign Hill Museum Association's image catalogue and the Ballarat & District Genealogical Society's online search tool, both of which pull records from shared repositories. When the same photograph appears under three different file names with conflicting metadata, volunteer researchers can spend hours chasing the same dead end.

What the Audit Actually Found

The May audit covered roughly 67,000 image files across four collections managed by the Ballarat Regional Archives Centre. Auditors found that 14,200 of those files — about 21 percent — were exact or near-exact duplicates introduced through successive scanning rounds between 2014 and 2024. The earliest problem batch dates to a 2016 scanning project conducted at the old Lydiard Street repository before collections were consolidated at the Doveton Street facility. Staff at the time used two different scanning rigs running separate software, and neither system cross-checked files against existing records before ingestion.

Storage costs are not trivial. The archives centre pays approximately $4,800 per year for cloud storage through a state-government contract managed under the Public Record Office Victoria framework. Eliminating verified duplicates could cut that bill by an estimated 18 percent, saving roughly $864 annually — modest on its own, but compounded across the 11 regional archive nodes connected to the Central Highlands network, the figure becomes meaningful. The deeper expense is labour. Manual review of flagged duplicates, even with AI-assisted deduplication software trialled since March 2026, is estimated to require 420 hours of skilled archival work. At the standard Victorian public sector archival classification rate, that translates to approximately $29,400 in staff time.

Sovereign Hill has its own stake in the clean-up. The Ballarat-based living museum draws on the shared regional image database for exhibition development, educational resources and its recently relaunched Gold Museum interpretive displays on Bradshaw Street. Staff there have reported finding the same mid-19th-century goldfields photograph catalogued under different accession numbers on at least six separate occasions in 2025 alone, creating confusion for curators building the museum's new digital timeline, which is scheduled to go live in October 2026.

The Fix — and Who Pays for It

Three options are now on the table ahead of a Council briefing scheduled for late July. The first is a fully manual review, the most accurate but most expensive path at the $29,400 estimate. The second is deploying deduplication software — the archive trialled a tool called Photosift in March, which processed 10,000 images in under four hours but flagged a false-positive rate of roughly 3.4 percent, meaning some unique images would be at risk of deletion. The third option is a hybrid approach: run automated deduplication first, then apply manual verification only to files the software flags as uncertain.

The hybrid model appears to have the most support inside the archives centre, partly because it can be staged across the remaining 12 months of the Central Highlands Regional Partnership program without requiring a separate funding application. The archives centre has already written to the Victorian Public Record Office seeking guidance on whether duplicate-removal costs qualify as an eligible expense under existing grant conditions.

For community members who use the Ballarat & District Genealogical Society's search tools or access heritage images through the City of Ballarat's online history portal, the practical advice is straightforward: flag inconsistencies when you find them. The archives centre has set up a dedicated feedback form — linked through the Council's website — specifically for reporting suspected duplicate or mislabelled records. Every report shortens the manual review queue. The audit team says public submissions since the form launched in June have already identified 340 duplicate pairs that automated tools missed entirely.

Spread the word

Your reaction

Bookmark this story to your reading list.

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Ballarat

This article was produced by the The Daily Ballarat editorial desk and covers news in Ballarat. See our editorial standards for how we use AI.

The Daily Ballarat brief

The day's Ballarat news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Ballarat and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Ballarat news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Ballarat and accept our Privacy Policy. Unsubscribe anytime.

More from Ballarat

More from Ballarat

Enjoyed this story? Get tomorrow's briefing free.