METHODOLOGY
Built by Students, Powered by AI
How Lithuanian-American high school students are building a professional digital archive with their phones

The Žiburio Archive comprises approximately 4,000 Lithuanian diaspora materials accumulated over nearly a century. The majority are between 50 and 80 years old. Many were printed on poor-quality postwar paper; crumbling spines, fading type, and missing pages are common throughout the collection. Without systematic cataloging, a significant portion of these materials will become permanently inaccessible within one to two decades. Concurrently, the generation that produced, published, and read these works is passing — and with it, irreplaceable knowledge of provenance, purpose, and cultural context.
This project constitutes a direct response to both forms of loss. The methodology has been designed to meet archival standards while remaining operationally accessible to high school students working with mobile devices. This is a deliberate pedagogical choice: direct engagement with primary sources is among the most effective means of forming a young person’s relationship with their cultural heritage. The technology enables that engagement to occur at a professional standard of documentation.
The Process
Each intern selects a book from the sorted collection, opens it, and documents it through a sequence of approximately ten photographs submitted via a purpose-built mobile application. These photographs are not incidental illustrations — they constitute the primary documentary input from which all subsequent metadata is derived.
The photographic protocol is structured to train the intern in systematic bibliographic observation. Each photograph targets a specific element of the book, and the sequence itself teaches the student where to look and what to notice:
Not every book contains all elements. A hymnal may lack a table of contents; a pamphlet may have no colophon. The intern learns to recognize what is present and what is absent — and both carry information. A missing copyright page in a 1948 DP camp publication tells its own story about the conditions of production.
This is not data entry. The protocol teaches a way of reading that most people — including adults — have never practiced: how to examine a book as a physical object with its own history. Where was it printed? Who approved it for publication? Whose handwriting is inside the cover? What does the paper quality reveal about the circumstances of its production? By the third or fourth book, interns begin noticing details unprompted — a library stamp from a DP camp, a hand-corrected erratum, a price written in occupation-era currency. The photographic protocol is the scaffold; the observational skill it cultivates is the lasting outcome.
No scanner is required. No workstation. A phone and a Saturday morning are sufficient to begin. Professional scanning enters the workflow at a later phase, when the project advances to archival-quality digitization with OCR full-text search. However, the structured catalog — the layer that renders materials findable, searchable, and interconnected — is built entirely from phone photographs.
The choice of the phone as the primary instrument is not a concession to limited resources — it is a pedagogical decision. The phone is the native medium of this generation. It is where they communicate, where they create, where they are most fluent. By placing a purpose-built application on a device they already carry, the project meets young people in the environment they inhabit and redirects the tool they know best toward materials they have never encountered. A leaderboard tracks contributions and surfaces competition. The interface speaks the language of the technology these students already use — and through it, they engage with materials printed decades before their parents were born. The newest technology applied to the oldest materials. That juxtaposition is the point.


Stabilization and Sorting
Prior to any cataloging, the collection requires physical stabilization. Many materials had been preserved for decades under informal conditions: stored in mixed containers, without chronological order or identification. Before documentation can proceed, the collection must be made physically legible.
Students in the Skaitmeniniai Knygnešiai internship begin this work directly with the physical collection. Books are removed from storage containers and sorted by approximate historical period, mapped to six waves of Lithuanian emigration. This establishes the first chronological structure across the collection and creates the conditions for systematic cataloging.


Vincas
From the submitted photographs, Vincas — the archive's AI system, named for Vincas Kudirka — automatically extracts a structured set of metadata fields, each accompanied by a confidence score. Vincas proposes field values derived from visual content. It does not finalize any field, does not assign cultural or historical context independently, and does not publish. Its function is to compress the interval between a photographed book and a structured catalog proposal — not to replace archival judgment.
The relationship between intern and system is reciprocal. The intern teaches Vincas to notice details the AI missed — a handwritten inscription, an ownership stamp, a dedication. Vincas surfaces context the intern did not possess: when the imprimatur reads “Kaunas, 1943,” that is a conversation about occupation; when the orthography employs “sz” in place of “š,” that is a conversation about the press ban. Each book processed deepens understanding on both sides of the interface.
Review and Publication
Every record proposed by Vincas enters a review dashboard, where it is examined against the source photographs prior to publication. Key fields — title, date, material type, copyright status, cultural context — are verified against the photographed source material. Nothing is published automatically. Human oversight is present at every stage of the workflow: the student photographs and submits, the system processes and proposes, the archive curator verifies and publishes.
The Cataloging Framework
Each item in the archive is cataloged according to the CultureNet framework — a structured methodology that goes beyond standard bibliographic data to address historical, linguistic, cultural, and pedagogical dimensions. Every record carries period assignment, language register analysis, pedagogical recommendations, condition assessment, copyright status, and entity links connecting persons, organizations, locations, and events across the entire collection. The result is not a list of books but a knowledge system — one that reveals the networks, institutions, and relationships that shaped Lithuanian cultural life in the diaspora.
Discovery and Attribution
The interns operate within a competitive structure: who located the oldest volume, who identified the most unusual title, who surfaced a publication from a press no one had previously documented. Discovery itself serves as the measure of engagement, and the competitive framework is deliberately designed to reward depth of observation over speed of processing.

Every published record permanently credits the student who discovered and submitted the material. That attribution does not expire, is not anonymized, and travels with the record indefinitely. A student who identified a 1929 textbook is the named contributor to that archival record — a record that may subsequently appear in scholarly citations, institutional histories, and research databases.
The Skaitmeniniai Knygnešiai students are the archive's builders — primary interpreters of historical material working with contemporary tools. Their names belong to these records in the same way that the names of editors, publishers, and institutions belong to the materials they identified.