DocuWeb22

Website for the "Documenting Web Data for Social Research (#DocuWeb22)" Workshop at WebSci2022 conference

View the Project on GitHub frohleon/DocuWeb22

DocuWeb22 - Documenting Web Data for Social Research

A participatory workshop for developing structured and reusable practices

A half-day, in person workshop at WebSci’22 in Barcelona! The workshop takes place Sunday, June 26. We encourage participants to attend in person in order to add to the debate, but will also make the first part of the workshop (introduction + keynote) available for online participation.

Workshop Description

With this half-day workshop, we want to collaboratively discover best practices as well as frequent pitfalls encountered when working with Web data. While the documentation of datasets is a well established standard in a wide range of disciplines, the documentation of Web data comes with a number of unique challenges - among them the potential influence of the collection process on the composition of the resulting dataset, or the use of automated processing methods that might not generalize well to the problem at hand.

During the first part of the workshop, we will introduce and discuss existing frameworks and guidelines for research with Web data. We will cover approaches targeted at documenting specific parts of the research pipeline (datasets, models), approaches that encourage critical reflection inspired by the social sciences (total survey error), as well as a combination of the two.

In the second part of the workshop, we hope to interactively explore different experiences and perspectives of the participants on the collection and documentation of Web data. We would therefore like to encourage everyone to follow our Call for Participation and share their views on the topic with us and the other participants (however, submission is not a requirement for participation in the workshop). Lastly, we will collectively work on the documentation of a typical Web data collection process, discovering and discussing potential limitations and sources of error along the way.

Call for Participation (Abstracts)

Schedule

Session Time (tentatively)
Opening and Introductions 15 minutes
Session 1: Introduction to existing guidelines for research with Web data 30 minutes
Session 2: Invited keynote talk + Q&A 45 minutes
Session 3: Short presentations of example cases from participants - Part I 25 minutes
Break 30 minutes
Session 4: Short presentations of example cases from participants - Part II 25 minutes
Session 5: Applying guidelines for documenting limitations of research designs to selected cases (small working groups) 45 minutes
Final group discussion, lessons learned and closing 25 minutes

Keynote Speaker

More info coming soon, stay tuned!

Organizers

Leon Fröhling is a doctoral researcher in the department of Computational Social Science at GESIS, Leibniz Institute for Social Sciences, Cologne. He is interested in studying the ways in which theoretically derived research frameworks may be transferred into actual research practice.

Indira Sen is a doctoral researcher in Computer Science in the department of Computational Social Science at GESIS, Leibniz Institute for Social Sciences, Cologne. Her interest lies in understanding biases in inferential studies from digital traces, with a focus on natural language processing.

Dr. Katrin Weller is an information scientist and leads the Digital Society Observatory team at GESIS’ Computational Social Science Department, and is co-lead of the Research Data and Methods Dept. at the Center of Advanced Internet Studies (CAIS).