Photo Metadata for Genealogy and Family Archive Scanning

By Duncan Rawlinson · Updated

Family photos arrive in shoeboxes, photo albums with crumbling adhesive, and envelopes labeled in the handwriting of someone who is no longer around to ask. Scanning them is the easy part. The hard part is the metadata: who is in the picture, when was it taken, where, and what was happening.

Get the metadata right and a family archive becomes searchable, sharable, and durable across generations. Skip it and you have a digital shoebox that future relatives will need to reconstruct from scratch. The work is worth doing once, well.

This guide covers how to think about family photo metadata the way professional archivists do, how to use AI to accelerate the obvious parts, and how to make absolutely certain you separate verified facts from educated guesses so you do not poison your family tree with confident inventions.

Scanning the Shoebox

Before you can apply metadata, you need digital files. The order of operations matters because some scanning choices affect what kind of metadata you can later add.

Scan at Archival Resolution

For prints up to 8x10, scan at 600 dpi minimum. For smaller prints (Polaroids, wallet photos), scan at 1200 dpi to capture detail you might lose otherwise. Save as TIFF for the master archive and JPEG for working copies. Storage is cheap; rescanning Aunt Linda's collection is not.

Scan the Backs Too

Many old prints have writing on the back: names, dates, locations, occasions. Scan the back as a separate image with the same base filename ("1972-cousins-front.jpg" and "1972-cousins-back.jpg"). The handwritten notes are often your only ground truth and they should be preserved alongside the front.

Group Before You Scan

Before scanning, sort prints into rough piles by era, family branch, or occasion. Photos that are clearly from the same event should be scanned together with sequential filenames. This makes batch metadata work far faster because you can apply the same date, location, and event tags to a contiguous range.

AI-Generated Approximate Dates from Visual Cues

One of the most useful things AI can do for a family archive is estimate when a photo was probably taken based on visual evidence: clothing styles, hairstyles, car models, photo print formats, color palette, and architectural details.

What AI Can Estimate Well

Decade-level dating is usually reliable. AI can tell a 1950s family portrait from a 1980s one with high confidence based on print format, color tones, fashion, and hairstyles. It can often narrow further: "early 1970s" or "late 1980s" estimates hold up well against known reference photos.

What AI Cannot Do

AI cannot date a photo to a specific year reliably, especially for everyday snapshots. It cannot account for older relatives wearing decades-out-of-date clothing, or hand-me-downs that pushed a child's outfit a decade behind the trend. Treat AI date estimates as a starting range, never as a fact.

AI-Generated Location Guesses

AI can sometimes recognize landmarks, regional architectural styles, and signage that places a photo in a country or region. This is genuinely useful when the back of the print says nothing and the family no longer remembers.

Useful Location Signals AI Can Read

  • Famous landmarks identifiable from a fragment in the background.
  • Architectural style that suggests region (Cape Cod, Tudor revival, brutalist apartment block).
  • Vehicle license plates when readable.
  • Signage in a specific language or alphabet.
  • Visible terrain like distinctive coastlines, mountains, or vegetation.

The risk is overclaiming. An AI confidently placing a photo in "Boston, Massachusetts" because of a brick rowhouse may be wrong by a thousand miles. Treat AI locations as candidate hypotheses to verify with relatives, not as facts to commit to your family tree.

The Difference Between Fact and Inference

This is the single most important discipline in family archive work. Confusing fact with inference is how family trees fill up with confident errors that get copied across generations of researchers.

Mark Inferred Data Clearly

If the date is written on the back of the print in your great-grandmother's handwriting, it is a fact (subject to her memory at the time, but reasonably reliable). If the date came from AI estimation, it is an inference. If a relative told you "that's Aunt Mae" looking at a photo from 60 years ago, it is somewhere in between.

A common archival convention is to use square brackets or "circa" for inferred values: [1972] or "circa 1972" for AI guesses, plain "1972" for verified dates. The convention extends to names: [Mae Whitfield?] for an uncertain identification.

Keep a Source Field

Every metadata field benefits from a corresponding source field. "Date: 1972 / Date Source: Annotation on print verso" is far more useful than "Date: 1972" alone. When future researchers question the data, the source field tells them how confident to be.

Resist the Urge to Fill in Blanks

An empty field is often more honest than a guess. Future relatives may have access to information you do not. A confidently-wrong entry is harder to correct than a blank one.

GEDCOM-Friendly Metadata

GEDCOM is the standard format for genealogical data exchange. Most family tree software, including Ancestry, MyHeritage, FamilySearch, and Gramps, can import GEDCOM files. If you want your photo archive to live alongside your family tree, the metadata you write should map cleanly into GEDCOM concepts.

Use Standard Fields

IPTC and XMP metadata embedded in your scans should populate the standard fields: Caption, Keywords, Date Taken, Location, Creator, and Copyright. Most genealogy software reads these directly when you import images.

Use ISO Date Format Where You Can

1972-06 is unambiguous. "June 72" gets misparsed by half the software in the world. Stick to YYYY, YYYY-MM, or YYYY-MM-DD as appropriate to your level of certainty.

Locations Should Be Hierarchical

GEDCOM expects locations from most specific to least specific, separated by commas: "123 Main Street, Smithville, Lancaster County, Pennsylvania, USA." Even if you only know two levels, write them in this order so the software parses them correctly.

Controlled Vocabularies for Family Names

"Grandma," "Nana," "Mary," "Mary E.," "Mary Whitfield," and "Mary Whitfield (née Carlson)" can all refer to the same person. If you tag photos with whichever name comes to mind, search becomes useless and your archive gets harder to use the larger it gets.

Pick One Canonical Form Per Person

Decide on the canonical name format and stick to it. "Firstname Middlename Lastname (Maidenname) [birth-death]" is one common convention. "Mary Elizabeth Whitfield (Carlson) [1898-1972]" identifies someone unambiguously across generations of a family with overlapping names.

Maintain a Person Index

Keep a separate document listing every person who appears in the archive, with their canonical name, life dates, relationship to the family, and any nicknames. This becomes the authoritative reference for tagging and a useful artifact in its own right.

Tag Relationships, Not Just Names

"Mary Whitfield, grandmother" is more useful than "Mary Whitfield" alone, especially for descendants who never met her. Relationship tags also help future relatives understand who they are looking at without consulting the family tree.

How PhotoScanr Helps Family Archive Projects

PhotoScanr was named for this kind of work. Scanning a few hundred family photos and writing metadata for each one by hand is the kind of project that gets started and abandoned. The tool exists to keep you moving.

Batch Processing for Whole Albums

Scan a photo album, drop the images into PhotoScanr in one batch, and get descriptions, approximate dates, and location guesses generated in minutes. Pro covers 100 images per day with batches of 25; Studio (formerly Power) covers 600 per day with batches of 100. A long weekend can clear a multi-generational shoebox.

Style Preferences for Archive Voice

Configure PhotoScanr with a style preference suited to archival work: factual, neutral tone; explicit hedging on inferred data ("appears to be," "circa"); structured output that maps cleanly to your fields. Every batch arrives in a format you can paste straight into your archive.

Grounding to Reduce False Confidence

Grounding constrains the AI to claims it can support with visible evidence in the image. This is essential for archival work, where confidently wrong inferences can poison the historical record for descendants.

ZIP and Lightroom Export

Export your processed images and metadata as a ZIP with a CSV, ready to import into a genealogy database, a Lightroom catalog, or a family-shared cloud archive. The Lightroom mode in particular is well-suited for family archivists who already use Lightroom for digital photos.

For accessibility considerations as you publish family photos online, see our photo blog alt text guide.

Organizing for Descendants

The point of a family archive is not to satisfy you today. It is to give the people who come after you a coherent, searchable, trustworthy record of where they came from. That is a higher bar than a personal photo library.

Treat fact and inference as different categories. Use controlled vocabularies for names. Use standard date formats and hierarchical locations. Store master scans at archival resolution and keep the backs of prints alongside the fronts. Use AI to handle the volume problem so you do not give up before finishing, but treat its output as drafts to verify, not facts to commit.

A descendant in 50 years should be able to open your archive, search for "Mary Whitfield 1972 Pennsylvania," and find the right photo with full context. That is what good metadata looks like, and it is the difference between handing them a real archive and handing them a bigger shoebox.

Ready to Start the Family Archive?

Process scanned family photos in batches and generate first-draft metadata you can verify

Try PhotoScanr Free

Free to use • No sign-up required • Instant results