Build, Access, Analyze: Introducing ARCH (Archives Research Compute Hub)

Imatge
Àmbits Temàtics
Àmbits de Treball

We are exci­ted to announce the public avai­la­bi­lity of ARCH (Archi­ves Rese­arch Compute Hub), a new rese­arch and educa­tion service that helps users easily build, access, and analyze digi­tal collec­ti­ons compu­ta­ti­o­nally at scale. ARCH repre­sents a combi­na­tion of the Inter­net Archi­ve’s expe­ri­ence suppor­ting compu­ta­ti­o­nal rese­arch for more than a decade by provi­ding large-scale data to rese­ar­chers and data­set-orien­ted service inte­gra­ti­ons like ARS (Archive-it Rese­arch Servi­ces) and a colla­bo­ra­tion with the Archi­ves Unle­as­hed project of the Univer­sity of Water­loo and York Univer­sity. Deve­lop­ment of ARCH was gene­rously suppor­ted by the Mellon Foun­da­tion.

ARCH Dash­bo­ard

What does ARCH do?

ARCH helps users easily conduct and support compu­ta­ti­o­nal rese­arch with digi­tal collec­ti­ons at scale – e.g., text and data mining, data science, digi­tal scho­lars­hip, machine lear­ning, and more. Users can build custom rese­arch collec­ti­ons rele­vant to a wide range of subjects, gene­rate and access rese­arch-ready data­sets from collec­ti­ons, and analyze those data­sets. In line with best prac­ti­ces in repro­du­ci­bi­lity, ARCH supports open publi­ca­tion and preser­va­tion of user-gene­ra­ted data­sets. ARCH is currently opti­mi­zed for working with tens of thou­sands of web archive collec­ti­ons, cove­ring a broad range of subjects, events, and time­fra­mes, and the plat­form is acti­vely expan­ding to include digi­ti­zed text and image collec­ti­ons. ARCH also works with vari­ous porti­ons of the overall Wayback Machine global web archive tota­ling 50+ PB going back to 1996, repre­sen­ting an exten­sive archive of contem­po­rary history and commu­ni­ca­tion.

ARCH, In-Brow­ser Visu­a­li­za­tion

Who is ARCH for? 

ARCH is for any user that seeks an acces­si­ble appro­ach to working with digi­tal collec­ti­ons compu­ta­ti­o­nally at scale. Possi­ble users include but are not limi­ted to rese­ar­chers explo­ring disci­pli­nary ques­ti­ons, educa­tors seeking to foster compu­ta­ti­o­nal methods in the class­room, jour­na­lists trac­king chan­ges in web-based commu­ni­ca­tion over time, to libra­ri­ans and archi­vists seeking to support the deve­lop­ment of compu­ta­ti­o­nal lite­ra­cies across disci­pli­nes. Recent rese­arch efforts making use of ARCH include but are not limi­ted to analy­sis of COVID-19 crisis commu­ni­ca­ti­ons, health misin­for­ma­tion, Latin Ameri­can women’s rights move­ments, and post-conflict soci­e­ties during recon­ci­li­a­tion. 

ARCH, Gene­rate Data­sets

What are core ARCH featu­res?

Build: Leve­rage ARCH capa­bi­li­ties to build custom rese­arch collec­ti­ons that are well scoped for speci­fic rese­arch and educa­tion purpo­ses.

Access: Gene­rate more than a dozen diffe­rent rese­arch-ready data­sets (e.g., full text, images, pdfs, graph data, and more) from digi­tal collec­ti­ons with the click of a button. Down­load gene­ra­ted data­sets directly in-brow­ser or via API. 

Analyze: Easily work with rese­arch-ready data­sets in inter­ac­tive compu­ta­ti­o­nal envi­ron­ments and appli­ca­ti­ons like Jupy­ter Note­books, Google CoLab, Gephi, and Voyant and produce in-brow­ser visu­a­li­za­ti­ons.

Publish and Preserve: Openly publish data­sets in line with best prac­ti­ces in repro­du­ci­ble rese­arch. All publis­hed data­sets will be preser­ved in perpe­tuity. 

Support: Make use of synch­ro­nous and asyn­ch­ro­nous tech­ni­cal support, online trai­nings, and exten­sive help center docu­men­ta­tion.

How can I learn more about ARCH?

To learn more about ARCH please reach out via the follo­wing form