A team of volunteers from Accenture has built an artificial intelligence (AI)-based solution that helps extract information on victims of Nazi persecution from documents in the Arolsen Archives 40 times faster than previous efforts.
The Arolsen Archives preserve the world’s largest collection of documents on Nazi persecution — 110 million documents and digital objects, a portion of which are part of UNESCO’s Memory of the World program — to keep the memory of the crimes of the German terror regime alive. An essential part of the Archives’ work is to make these documents accessible to all who wish to search for traces of Holocaust victims and survivors, persecution of minorities and forced labor.
Every document maintained in the archives needs to be reviewed and its information (e.g., the family name and birth date on a prisoner registration form) put into a database. To facilitate this process, the Arolsen Archives established “#everynamecounts,” a crowdsourcing project for volunteers to extract information from documents manually.
Translating, reading, transcribing, cataloging and validating these documents by hand could take decades. Each document is indexed independently by three volunteers and, if the entries don’t match, reviewed for accuracy by an Arolsen Archives employee. In effect, it can take up to four people to index and validate four documents in one hour.
Ian Lever, an Accenture volunteer and a member of the company’s Jewish Employee Resource Group, quickly realized that AI could accelerate this process significantly. Within 10 weeks, he and other Accenture volunteers set up an AI solution to index the documents. Because the AI captures the information faster and increases its accuracy, four volunteers can now validate approximately 160 documents in one hour, a 40-fold increase in productivity.
Working with Accenture’s Solutions.AI team, the volunteers configured an existing Accenture AI solution, which uses optical character recognition and machine learning technology. It indexes documents that are particularly difficult and tedious to extract for humans. These include prisoner and transfer lists with dozens of rows, concentration camp records and tracing documents, which are inquiries about the locations and fates of family members and loved ones.
Even though the AI does the heavy lifting, human oversight of the process remains important not just to ensure accuracy but also to keep the AI solution learning. By reviewing and correcting information, volunteers “teach” the solution to recognize handwriting characters and abbreviations that were typical for the time. Thanks to their inputs, the AI has gradually improved its precision by 10% within the form field of “mother’s last name.” For the “religion” field, the AI is now operating at 99% confidence.
Since Accenture implemented the AI solution in December 2021, the solution has indexed more than 160,000 names of Nazi persecution victims, extracted information from more than 18,000 documents and clustered more than 60,000 documents into similar groups to improve identification and analysis.
More than 950 Accenture people have volunteered for the project to date, with Accenture also supporting maintenance and further development of the AI solution.
“We are proud of our people’s efforts to help keep alive the memories of those who endured unimaginable pain and suffering, at a time when antisemitism, racism and ultra-nationalism are rearing their ugly heads again,” said David Metnick, a managing director and executive sponsor of the project at Accenture. “We saw a problem and, in it, an opportunity to live our values and use digital technology for good.”
“We are overwhelmed by how many volunteers support digitizing our archive,” said Floriane Azoulay, Director of the Arolsen Archives. “Our collaboration with the Accenture team stands out. It is fantastic that there is now a digital solution to capture the content of documents faster, which helps make more important information about the fates of Nazi persecution victims findable in our online archive.”