Login for PhD students at UCPH
Login for others
Home
Course Catalogue
Communication & Teaching
Online Courses
Responsible Conduct of Research
Specialist Courses
Statistics
Summer Schools
PhD Supervision for Academic staff
Course fee, cancellation policy and invoice details
How to apply for a course
PhD students from NorDoc universities
Newly enrolled PhD students at SUND
PhD students at UCPH
Other applicants
How to log on to the course system
How to log in as a student
How to log in as a course provider
Contact information
Processing...
Advanced Topics in Data Analysis
Provider: Faculty of Health and Medical Sciences
Activity no.: 3339-23-00-00
Enrollment deadline: 27/03/2023
Date and time
24.04.2023, at: 00:00 - 05.05.2023, at: 00:00
Regular seats
25
Course fee
7,080.00 kr.
Lecturers
Shyam Gopalakrishnan
ECTS credits
5.00
Contact person
Ida Marie Bergman Rasmussen E-mail address: ida.mbr@sund.ku.dk
Enrolment Handling/Course Organiser
PhD administration E-mail address: phdkursus@sund.ku.dk
Aim and content
This is a generic course. This means that the course is reserved for PhD students at the Graduate School of Health and Medical Sciences at UCPH.
Anyone can apply for the course, but if you are not a PhD student at the Graduate School, you will be placed on the waiting list until enrollment deadline. After the enrolment deadline, available seats will be allocated to the waiting list.
The course is free of charge for PhD students at Danish universities (except Copenhagen Business School), and for PhD students at NorDoc member universities. All other participants must pay the course fee
Learning objectives
A student who has met the objectives of the course will be able to:
1. Understand the probabilistic principles behind statistical analysis of large-scale datasets in the life, earth and environmental sciences
2. Identify which types of statistical methods are appropriate for different types of large-scale datasets
3. Analyze data in an efficient manner using the R or a similar statistical language
4. Diagnose and assess the results of statistical methods used in life, earth and environmental sciences, accounting for the assumptions underlying each test
5. Explain the basic principles of modern high-performance statistical methods, e.g. Monte Carlo methods, deep learning etc.
Content
This course is meant as an in-depth exposure to the state-of-the-art statistical techniques commonly used in life, environmental and earth sciences. It is also a natural follow-up to the course on Fundamentals in Large-Scale Data Analysis offered within the “Life, Earth and Environmental Sciences” Programme. The attendees will learn about the probabilistic underpinnings behind popular inferential methods, while also applying these methods on practical, real-world examples, using the R programming language. We will especially focus on large-scale datasets, often involving a high number of variables.
The students will learn how to use advanced statistical techniques, while also obtaining an understanding of the assumptions underlying these methods, as well as their scope and limitations.
First, the students will be exposed to the principles behind frequentist and Bayesian inference.
Then, we will introduce the students to supervised learning methods, including regression models, mixed models, shrinkage methods and support vector machines.
This will be followed by a section on unsupervised learning, including PCA, MDS and clustering.
Finally, we will provide a broad overview of advanced methods, including random forests and deep learning, in various scientific applications.
Participants
The course is broadly meant for students in life, earth and/or environmental sciences who aim to develop their statistical and computational toolbox, in order to be able to tackle large-scale datasets. Students should have some background in basic probability, statistical inference and/or data science.
Course prerequisites
1. A basic understanding of probability theory and distributions.
2. The student must have taken the “Fundamentals in Large-Scale Data Analysis” course OR the student must have a waiver - by demonstrating their knowledge of the contents of the basic data analysis course.
Please take a look at the learning objectives of the “Fundamentals in Large-Scale Data Analysis” course for details on what skills the student is expected to have at the end of that course.
Relevance to graduate programmes
The course is relevant to PhD students from the following graduate programmes at the Graduate School of Health and Medical Sciences, UCPH:
- Life, Earth and Environmental Sciences
- Biostatistics and Bioinformatics
Language
English
Form
Lectures interspersed with discussions and group work involving computational exercises in R and the unix console.
Course directors
- Fernando Racimo, Associate Professor, University of Copenhagen, fracimo@sund.ku.dk
- Shyam Gopalakrishnan, Associate Professor, University of Copenhagen, shyam.gopalakrishnan@sund.ku.dk
Teachers
- Shyam Gopalakrishnan (course director)
- Fernando Racimo (course director)
- Martin Sikora, Associate Professor, KU
- David Duchene, Postdoc, KU
Teaching assistants
- Martin Petr, Postdoc, KU
- Rasmus Henrik Amund Henrikssen, PhD student, KU
- Jazmin Ramos Madrigal, Postdoc, KU
- Julian Regalado Perez, Postdoc, KU
Dates
Block 3 - 2 weeks – Weeks 16 and 17 – April 24th to May 5th (weekdays - 9:00 AM - 2:30 PM)
Course location
Teaching rooms in EvoGenomics - Kommunehospital
Registration
Please register before March 28th, 2023.
Seats to PhD students from other Danish universities will be allocated on a first-come, first-served basis and according to the applicable rules.
Applications from other participants will be considered after the last day of enrollment.
Course books
- Statistical Thinking from Scratch - MD Edge
- An Introduction to Statistical Learning - James et al.
Further reading
Probability, Statistics and Machine Learning
- Elements of Statistical Learning - freely available online: https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12_toc.pdf
- Statistical Inference - Casella and Berger
- Bayesian Data Analysis - Gelman et al.
- Machine Learning: A probabilistic perspective [ select chapters ]
The R programming language
- R in action - Kabacoff
- R for data science - Wickham & Grolemund
Note: All applicants are asked to submit invoice details in case of no-show, late cancellation or obligation to pay the course fee (typically non-PhD students). If you are a PhD student, your participation in the course must be in agreement with your principal supervisor.
Search
Click the search button to search Courses.
Choose course area
Course Catalogue
Choose sub area
Communication & Teaching
Online Courses
Responsible Conduct of Research
Specialist Courses
Statistics
Summer Schools
PhD Supervision for Academic staff
Course calendar
See which courses you can attend and when
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Processing...
RadEditor - HTML WYSIWYG Editor. MS Word-like content editing experience thanks to a rich set of formatting tools, dropdowns, dialogs, system modules and built-in spell-check.
RadEditor's components - toolbar, content area, modes and modules
Toolbar's wrapper
Paragraph Style
Font Name
Real font size
Apply CSS Class
Custom Links
Zoom
Content area wrapper
RadEditor hidden textarea
RadEditor's bottom area: Design, Html and Preview modes, Statistics module and resize handle.
It contains RadEditor's Modes/views (HTML, Design and Preview), Statistics and Resizer
Editor Mode buttons
Statistics module
Editor resizer
Design
HTML
Preview
RadEditor - please enable JavaScript to use the rich text editor.
RadEditor's Modules - special tools used to provide extra information such as Tag Inspector, Real Time HTML Viewer, Tag Properties and other.
N
ew courses
Courses are published regularly. High demand courses are announced in spring and autumn.
Learn which courses are announced on fixed dates