Login for PhD students/staff at UCPH      Login for others
Reproducible Quantitative Data Science
Provider: Faculty of Science

Activity no.: 5187-25-02-02There are 21 available seats 
Enrollment deadline: 01/06/2025
PlaceDIKU, University of Copenhagen, 2200 København N.
Date and time19.06.2025, at: 00:00 - 10.11.2025, at: 16:00
Regular seats30
Activity Prices:
  - Deltager/Participant Others3,000.00 kr.
  - Deltager/participant enrolled at Danish Universities0.00 kr.
LecturersMelanie Ganz-Benjaminsen
Cyril Raymond Pernet
ECTS credits2.50
Contact personMelanie Ganz-Benjaminsen    E-mail address: ganz@di.ku.dk
Enrolment Handling/Course OrganiserMelanie Ganz-Benjaminsen    E-mail address: ganz@di.ku.dk
Teaching languageEnglish
Exam requirementsActive participation during lectures Homework assignments and presentation on the final day
Exam formAndet/Other
Grading scalePassed / Not passed
Course workload
Course workload categoryHours
Lectures35.00
Exercise(s)28.00
Preparation6.00

Sum69.00


Content
The course structure is over 5 days plus personal work: 2 days, course work, 2 days, course work, and 1 day with presentations.
The 5 physical days can be structured in the following fashion: 2 days in the lecture free week of block 4, 2 days in fall holiday and one day in the start of December.

Day 1 - Data Collection and data storage:
Date: 19th of June 2025, 09.00-16.00
Venue: TBA

- Introduction to reproducibility: Definitions, issues and origins - lecture (2 h)
- Data provenance: keeping track of where data are coming from - lecture and exercises (1h)
- How do you store data on your computer? Data structures and data naming - lecture and exercises (1h)
- Ethic and GDPR - lecture and practical case reviews (2h)

Day 2 - Reproducible designs, protocols and pre-registration:
Date: 20th of June 2025, 09.00-16.00
Venue: TBA

- Concepts and tools for protocol documentation, and study pre-registration - lecture (1.5h)
- Case studies - exercise (1h)
- Using markdown for documentation - practical (1/2h)
- Version control and social coding with Git and GitHub - practical (3h)

Course work (10 hours):

- Using your PhD research data, protocol, code, etc, write a report explaining from where you start, which measures are already in place to increase reproducibility as per concepts presented during days 1 and 2. What measures can be taken to increase reproducibility and if any, why some cannot be implemented? (min page count 3)

Day 3 - Better coding:
Date: 21st of August 2025, 09.00-16.00
Venue: TBA

- Literate programming - lecture and exercises (1h)
- Good coding practices - lecture and exercises (2h)
- Time to update your code - practical from student's own analysis scripts (3h)

Day 4 - Better analyses:
Date: 22nd of August 2025, 09.00-16.00
Venue: TBA

- P-hacking your data - lecture (1h)
- Encapsulate code for reproducibility using containers (2h)
- An introduction to computational analysis methods: permutation, bootstrap, cross-validation,
- out-of-sample generalization - lecture and exercises (3hours)

Course work (18 hours):

- Make a copy of an existing code you have used and/o used in the lab and improve it’s reproducibilty using any of the tools reviewed during the course: from better inline documentation and variable coding to updated analyses.
- Make a 10 minutes presentation summarizing all of your course works and what measures you have taken to improve reproducibility in your PhD.

Day 5 - Data sharing (9-16):
Date: 10th of November 2025
Venue: TBA

- The ‘data’ cycle, sharing from raw data to figures - lecture (1h)
- Reproducible publishing - a case study (1h)
- Presentations and discussions/social event with drinks and pizza (4h)

Aim and content
The Reproducible Quantitative Data Science course introduces key concepts, tools and analysis methods for reproducible data analysis in any type of quantitative research study. It is meant as a hands-on crash course in reproducible data analysis for PhD students.
In the course, we will cover the area of research data management and best practices for data before introducing the concepts of reproducible designs, protocols and pre-registration of research studies. Next, we will focus on literate programming and good coding practices and focus on how to improve the student’s code to make it more reproducible. Part of this is include using version control and also how to encapsulate code using containers. We will then go into issues in the actual data analysis and address computational analysis methods such as permutation, bootstrap, cross-validation and out-of-sample generalization. We are finishing the course by introducing the topic of reproducible publishing.

Formal requirements
We expect students to join the course several months after starting their PhD allowing them to already have data and some code. This will allow applying the concepts developed to their own data and code.
We assume that the students have some experience with programming as one cannot reproduce analyses using a graphical interface but only using code. We’ll try to be as agnostic as possible language wise, but prior exposure of bash/git, Matlab, Python are a plus.
During the course, active participation is expected including sharing an example of code written by the students for code review.

Learning outcome
Knowledge:
> Understand the concepts of reproducible designs, protocols and pre-registration of research studies
> Understand good coding practices
> Understand computational analysis methods such as permutation, bootstrap, cross-validation and out-of-sample generalization

Skills:
> Version control and social coding
> Develop literate programming and good coding practices
> Encapsulate code for reproducibility using containers

Competences:
> Propose measures to increase reproducibility in their own PhD research data analysis
> Prepare a manuscript in a reproducible fashion

Literature
We already have a Zotero group with all the course literature that can be made available, e-mail course reponsible Melanie Ganz-Benjaminsen ganz@di.ku.dk to be added to the Zotero group

Target group
The number of participants is limited at 30, and priority will be given to PhD students from UCPH-SCIENCE and UCPH-SUND.
.

Teaching and learning methods
The students need to prepare with background information before the course by going through the provided reading material.

During the physical meeting days, we intersperse lectures with exercises. A full overview over our teaching materials is publically available on Github:
https://github.com/CPernet/ReproducibleQuantitativeDataScience

Between the physical meetings the students will individually work on exercises.

Lecturers

UCPH lecturers
> Senior Scientist Cyril Pernet  https://di.ku.dk/CP
> Associate Prof. Melanie Ganz-Benjaminsen  https://research.ku.dk/MG-B

Guest Lectures

Physical visitors:
> Russ Poldrack, Stanford University, poldrack@stanford.edu
> Robert Oostenveld, Radboud University, r.oostenveld@donders.ru.nl
> Michael Hanke, Forschungszentrum Jülich GmbH, m.hanke@fz-juelich.de

Zoom lecturers from the US/Canada:
> Jean Baptiste Poline, McGill
> Ariel Rokem, University of Washington



Remarks
No participation fee for PhD Students enrolled at a Danish institution/ Danish University
All other students are required to pay the participation fee of 3000 DKK.

***



Search
Click the search button to search Courses.


Course calendar
See which courses you can attend and when
JanFebMarApr
MayJunJulAug
SepOctNovDec



Publication of new courses
All planned PhD courses at the PhD School are visible in the course catalogue. Courses are published regularly.