Login for PhD students/staff at UCPH      Login for others
Reproducible Quantitative Data Science
Second title: Reproducible Quantitative Data Science
Provider: Faculty of Science

Activity no.: 5187-24-02-02 
Enrollment deadline: 01/05/2024
Date and time13.06.2024, at: 09:00 - 02.12.2024, at: 16:00
[antalgange]5
Regular seats30
Course fee3,000.00 kr.
ECTS credits2.50
Contact personAmanda Lybke Rasmussen    E-mail address: amra@di.ku.dk
Enrolment Handling/Course OrganiserAmanda Lybke Rasmussen    E-mail address: amra@di.ku.dk
Teaching languageEnglish
Exam requirementsActive participation in the course days and completing the independent works assignments.
Exam formAndet/Other
Grading scalePassed / Not passed
Course workload
Course workload categoryHours
Lectures35.00
Independent work28.00

Sum63.00


Content
The course structure is over 5 days plus personal work: 2 days, course work, 2 days, course work, and 1 day with presentations.

Day 1 - Data Collection and data storage:
Date: 13 June 2023, 09.00-16.00
Venue: Niels Bohr Building 2.3.H.142, Jagtvej 155, 2100 Copenhagen

- Introduction to reproducibility: Definitions, issues and origins - lecture (2 h)
- Data provenance: keeping track of where data are coming from - lecture and exercises (1h)
- How do you store data on your computer? Data structures and data naming - lecture and exercises (1h)
- Ethic and GDPR - lecture and practical case reviews (2h)

Day 2 - Reproducible designs, protocols and pre-registration:
Date: 14 June 2024, 09.00-16.00

Venue: Niels Bohr Building 2.3.H.142, Jagtvej 155, 2100 Copenhagen

- Concepts and tools for protocol documentation, and study pre-registration - lecture (1.5h)
- Case studies - exercise (1h)
- Using markdown for documentation - practical (1/2h)
- Version control and social coding with Git and GitHub - practical (3h)

Course work (10 hours):

- Using your PhD research data, protocol, code, etc, write a report explaining from where you start, which measures are already in place to increase reproducibility as per concepts presented during days 1 and 2. What measures can be taken to increase reproducibility and if any, why some cannot be implemented? (min page count 3)

Day 3 - Better coding:
Date: 21 October 2024, 09.00-16.00

Venue: TBA

- Literate programming - lecture and exercises (1h)
- Good coding practices - lecture and exercises (2h)
- Time to update your code - practical from student's own analysis scripts (3h)

Day 4 - Better analyses:
Date: 22 October 2024, 09.00-16.00

Venue: TBA

- P-hacking your data - lecture (1h)
- Encapsulate code for reproducibility using containers (2h)
- An introduction to computational analysis methods: permutation, bootstrap, cross-validation,
- out-of-sample generalization - lecture and exercises (3hours)

Course work (18 hours):

- Make a copy of an existing code you have used and/o used in the lab and improve it’s reproducibilty using any of the tools reviewed during the course: from better inline documentation and variable coding to updated analyses.
- Make a 10 minutes presentation summarizing all of your course works and what measures you have taken to improve reproducibility in your PhD.

Day 5 - Data sharing (9-16):
Date: 2 December 2024

Venue: TBA

- The ‘data’ cycle, sharing from raw data to figures - lecture (1h)
- Reproducible publishing - a case study (1h)
- Presentations and discussions/social event with drinks and pizza (4h)

Aim and content
The Reproducible Quantitative Data Science course introduces key concepts, tools and analysis methods for reproducible data analysis in any type of quantitative research study. It is meant as a hands-on crash course in reproducible data analysis for PhD students.

In the course, we will cover the area of research data management and best practices for data before introducing the concepts of reproducible designs, protocols and pre-registration of research studies. Next, we will focus on literate programming and good coding practices and focus on how to improve the student’s code to make it more reproducible. Part of this is include using version control and also how to encapsulate code using containers. We will then go into issues in the actual data analysis and address computational analysis methods such as permutation, bootstrap, cross-validation and out-of-sample generalization. We are finishing the course by introducing the topic of reproducible publishing.


Formel requirements

We expect students to join the course several months after starting their PhD allowing them to already have data and some code. This will allow applying the concepts developed to their own data and code.
We assume that the students have some experience with programming as one cannot reproduce analyses using a graphical interface but only using code. We’ll try to be as agnostic as possible language wise, but prior exposure of bash/git, Matlab, Python are a plus.

During the course, active participation is expected. In session 1, we'll use padlet to interact with each other (anonymous postings are allowed) and also do group work. In session 2, we use GitHub (that you learn in session 1) and you will be required to share code and review each other’s code. It is recommended to share something you are working on, but if you feel uncomfortable with that, you can prepare something else to be shared/reviewed. In session 3, you must present in front of everybody.


Learning outcome
After course completion the students are expected to be able to:

Knowledge:
- Understand the concepts of reproducible designs, protocols and pre-registration of research studies.
- Understand good coding practices.
- Understand computational analysis methods such as permutation, bootstrap, cross-validation and out-of-sample generalization.

Skills:
- version control and social coding
- Develop literate programming and good coding practices.
- Encapsulate code for reproducibility using containers.

Competences:
- Propose measures to increase reproducibility in their own PhD research data analysis.
- Prepare a manuscript in a reproducible fashion.

Target group
The number of participants is limited at 30, and priority will be given to PhD students from UCPH-SCIENCE and UCPH-SUND.

Lecturers
UCPH lecturers:

- Cyril Pernet
https://di.ku.dk/english/staff/?pure=en/persons/763558
- Melanie Ganz-Benjaminsen
https://research.ku.dk/search/result/?pure=en%2Fpersons%2F341919

Remarks
Any inquiries regarding the course can be made to Melanie Ganz-Benjaminsen (ganz@di.ku.dk)

Examination: HomeWorks will be requested between sessions and examined to make up the total number of credits

Participation fee:
No particpation fee for PhD Students enrolled at a Danish institutions.
All other students are required to pay the participation fee of 3000 DKK.



Search
Click the search button to search Courses.


Course calendar
See which courses you can attend and when
JanFebMarApr
MayJunJulAug
SepOctNovDec



Publication of new courses
All planned PhD courses at the PhD School are visible in the course catalogue. Courses are published regularly.