Website for the "Documenting Web Data for Social Research (#DocuWeb22)" Workshop at WebSci2022 conference

View the Project on GitHub frohleon/DocuWeb22

DocuWeb22 - Documenting Web Data for Social Research

A participatory workshop for developing structured and reusable practices

A half-day, in person workshop at WebSci’22 in Barcelona! The workshop takes place Sunday, June 26. We encourage participants to attend in person in order to add to the debate, but will also make the first part of the workshop (introduction + keynote) available for online participation.

We are very happy to announce that Prof. Libby Hemphill will hold a keynote talk on her experiences with establishing a Social Media Archive (SOMAR) at ICPSR. Additionally, Dr. Ali Hürriyetoğlu will present an invited case study, showcasing the research design for the collection of a large dataset of Turkish Twitter users, and reflecting on how error frameworks offered guidance for this endeavor.

Workshop Description

With this half-day workshop, we want to collaboratively discover best practices as well as frequent pitfalls encountered when working with Web data. While the documentation of datasets is a well established standard in a wide range of disciplines, the documentation of Web data comes with a number of unique challenges - among them the potential influence of the collection process on the composition of the resulting dataset, or the use of automated processing methods that might not generalize well to the problem at hand.

During the first part of the workshop, we will introduce and discuss existing frameworks and guidelines for research with Web data. We will cover approaches targeted at documenting specific parts of the research pipeline (datasets, models), approaches that encourage critical reflection inspired by the social sciences (total survey error), as well as a combination of the two.

In the second part of the workshop, we hope to interactively explore different experiences and perspectives of the participants on the collection and documentation of Web data. We would therefore like to encourage everyone to follow our Call for Participation and share their views on the topic with us and the other participants (however, submission is not a requirement for participation in the workshop). Lastly, we will collectively work on the documentation of a typical Web data collection process, discovering and discussing potential limitations and sources of error along the way.

Call for Participation (Abstracts)


Please note: Our workshop starts at 14:30 CEST/UTC+2, ahead of the regular Afternoon session schedule.

Session Time CEST/UTC+2 (tentatively)
Session 1: Invited keynote talk by Prof. Libby Hemphill + Q&A 14:30 - 15:30
Break 15:30 - 15:50
Session 2: Introduction to existing guidelines for research with Web data 15:50 - 16:30
Session 3: Invited case study “Recent developments in the utilization and protection of text data collected from the web” 16:30 - 17:00
Break 17:00 - 17:20
Session 4: Short presentations of example cases from participants 17:20 - 17:45
Session 5: Applying guidelines for documenting limitations of research designs to selected cases (small working groups) 17:45 - 18:30
Final group discussion, lessons learned and closing 18:30 - 19:00

Speaker Bios

Keynote: Prof. Libby Hemphill is a professor at the University of Michigan, where she is associated with the School of Information, the Institute for Social Research and the Center for Social Media Research. At the ICPSR, where she also serves as director of the Resource Center for Minority Data, Libby recently established the Social Media Archive (SOMAR). Her past research is concerned with various aspects of the data curation process as well as the data management practices of social media researchers, making her the ideal fit for our keynote.

Invited Case Study: Dr. Ali Hürriyetoğlu is a postdoctoral research fellow at KNAW in the Odeuropa project working on historical multilingual text processing. Dr. Hürriyetoğlu was a postdoctoral research fellow at Koc University in the European Research Commission (ERC) projects “Emerging Welfare” (EMW) and “Social ComQuant: Excelling in Computational and Quantitative Social Sciences in Turkey’’ between 2017 and 2021. Mr. Hürriyetoğlu performed research on extracting actionable information from social media in the scope of his Ph.D studies at Radboud University. He has been working in industrial, governmental, and academic settings to process news and social media text in various domains throughout his career. His recent research focus is on the robustness and the generalizability of text processing systems across contexts. Dr. Hürriyetoğlu has been proposing challenges and organizing shared tasks on socio-political event extraction since 2019 in the scope of CLEF, LREC, ACL, and EMNLP.


Leon Fröhling is a doctoral researcher in the department of Computational Social Science at GESIS, Leibniz Institute for Social Sciences, Cologne. He is interested in studying the ways in which theoretically derived research frameworks may be transferred into actual research practice.

Indira Sen is a doctoral researcher in Computer Science in the department of Computational Social Science at GESIS, Leibniz Institute for Social Sciences, Cologne. Her interest lies in understanding biases in inferential studies from digital traces, with a focus on natural language processing.

Dr. Katrin Weller is an information scientist and leads the Digital Society Observatory team at GESIS’ Computational Social Science Department, and is co-lead of the Research Data and Methods Dept. at the Center of Advanced Internet Studies (CAIS).

How to attend

Register for the WebSci’22 conference! If you can’t or don’t want to travel to Barcelona for onsite participation, options for online participation are available as well. And if you are only interested in joining our workshop, you could also choose to just register for the workshops, either online or onsite! There are also still some reduced/free tickets available for researchers who qualify for the fair access programs. For all others: If you want to join only for the keynote but not for the rest of the workshop/conference, please email us to explore free access options.