METHODOLOGY
Methodology
Digitization Standards
Materials are photographed or scanned at minimum 300 DPI, with archival originals handled using conservation-grade protocols. Cover pages, title pages, tables of contents, copyright pages, and representative interior pages are captured for each item. HEIC and TIFF originals are converted to JPEG for web delivery while maintaining full-resolution archival copies.
Cataloging Approach
Each item is cataloged against a structured 67-field schema covering bibliographic, physical, linguistic, cultural, and pedagogical dimensions. Fields use controlled vocabularies where applicable — era classification, genre types, register categories — to ensure consistency across the corpus and support precise filtering and search.
Items enter the archive as "pending," move through a review stage, and are published only after human verification of key fields including title, year, material type, and copyright status.
Entity Linking
People, organizations, locations, and time periods mentioned in or associated with each item are identified and standardized. Variant names (e.g., "St. Peter's Lithuanian Parish" vs. abbreviated forms) are normalized to canonical identifiers, enabling cross-reference across the corpus. This structured entity linking powers the Connections view and the Related Records panels on item pages.
Preservation and Interoperability
Structured metadata is stored in Postgres with full provenance tracking. Citations link to specific source documents. The schema is designed for long-term archival durability and exports to standard formats for research integration. All relationships shown in the public interface are explainable in plain human terms and traceable to source materials.
Rights and Access
Each item carries a copyright status field determined by publication date, publisher identity, and known rights holders. Items may be categorized as Public Domain, In Copyright (metadata-only access), or Academic & Research Use (fair use context). Copyright status is verified during the review process and displayed prominently on item pages. When status is uncertain, items are marked as "Status Being Verified" until resolved.
Scope and Limitations
Phase 1 covers printed books and periodicals from the Žiburio Lithuanian Heritage School collection and related Dievo Apvaizdos Parish holdings, with emphasis on materials from 1900–1990. Photographs, audio recordings, and archival video are planned for Phase 2. Entity relationships in Phase 1 are derived from structured text fields; a full CCO-aligned entity graph is planned for Phase 2.