Backgrounder in Statistical Methods

This is the fourth module in the 2016 Informatics and Statistics for Metabolomics workshop hosted by the Canadian Bioinformatics Workshops. This lecture is by Jeff Xia from McGill University.

How it Begins by Kevin MacLeod is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/...)

Source: http://incompetech.com/music/royalty-...

Artist: http://incompetech.com/

Table of Contents:

00:10 -

01:16 - Yesterday

02:42 - Today

03:26 - Learning Objectives

05:37 - What is Statistics

07:02 - Main Components

09:01 - Types of Data

10:30 - Types of Data

10:55 - Quantitative Data

11:58 - Categorical Data

13:29 - Some Jargons (I)

15:25 - Some Jargons (II)

16:26 - Key Concepts in Statistics

17:30 - Issues when making inferences

19:06 - From samples to population

20:16 - P values

22:34 - Summary/Descriptive Statistics

22:52 - How do we describe the data?

26:36 - Mean, Median, Mode

26:46 - Mean, Median & Mode

27:56 - Variance, SD and SEM

29:59 - Quantiles

31:49 - Mean vs. Variance

33:56 - Univariate Statistics

34:35 - A Bell Curve

36:55 - Features of a Normal Distribution

37:27 - Normal Distribution

38:22 - Some Equations

39:09 - Standard Deviation (σ)

39:42 - Different Distributions

40:41 - Skewed Distribution

41:49 - Fixing a Skewed Distribution

42:23 - Log Transformation

43:35 - Log Transformation (Real Data)

44:31 - Centering, scaling, and transformations

47:13 - The Result

47:15 - Centering, scaling, and transformations

48:48 - The Result

49:28 - Centering, scaling, and transformations

49:29 - Log Transformation (Real Data)

49:31 - Centering, scaling, and transformations

49:34 - Log Transformation (Real Data)

49:34 - Log Transformation

49:35 - Fixing a Skewed Distribution

49:35 - Skewed Distribution

52:15 - Fixing a Skewed Distribution

52:16 - Log Transformation

52:17 - Log Transformation (Real Data)

52:17 - Centering, scaling, and transformations

52:18 - The Result

52:49 - The Result

53:05 - The Result

53:06 - The Result

53:10 - The Result

53:18 - The Result

53:26 - t-tests

54:45 - Types of t-tests

55:39 - Paired t-tests

57:02 - Another approach to group differences

58:13 - Calculating F

59:10 - What can be concluded from a significant ANOVA?

01:00:15 - Different types of ANOVA

01:02:57 - Conclusions

01:04:07 - Understanding

P values

01:06:58 - The p-value

01:07:27 - How to we compute a p value

01:07:42 - Non-normal distribution

01:07:51 - How to we compute a p value

01:08:21 - Non-normal distribution

01:09:55 - Normalization

01:10:39 - Boxplots of “standardized” data

01:11:13 - Non-parametric tests

01:12:18 - Empirical P values

01:13:37 - Basic Principle

01:16:02 - A simple example

01:17:12 - Permutation One

01:17:13 - A simple example

01:17:29 - Permutation One

01:17:39 - Permutations

01:17:40 - Permutation One

01:17:40 - A simple example

01:17:47 - Permutation One

01:18:03 - Permutations

01:18:13 - Compute empirical p value

01:20:23 - General Advantages

01:21:16 - Question

01:21:19 - General Advantages

01:21:54 - Question

01:23:01 - Hypothesis Testing & multiple testing issues

01:23:17 - Hypothesis Testing

01:23:52 - Hypothesis Testing (more details)

01:24:20 - Hypothesis Testing & P Value

01:24:51 - Multiple Testing Issues

01:25:39 - Multiple Testing Correction (I)

01:27:09 - Multiple Testing Correction (II)

01:28:43 - High-dimensional data

01:29:10 - Multivariate Statistics

01:29:15 - Multivariate Statistics

01:29:27 - Normal distribution

– a single variable

01:29:34 - Bivariate Normal

01:29:43 - Trivariate Normal

01:30:02 - The Reality

01:31:25 - The Practice

01:32:32 - Machine Learning

01:33:38 - Unsupervised Learning methods for high-dimensional data

01:34:36 - Clustering

01:34:42 - Clustering Requires...

01:35:23 - Two common clustering algorithms

01:36:29 - K-means clustering

01:36:34 - K-means clustering

01:37:17 - Nearest Neighbor Algorithm

01:37:24 - K-means clustering

01:37:26 - Nearest Neighbor Algorithm

01:38:00 - Hierarchical Clustering

01:38:08 - Key Parameters: similarities

01:38:42 - Similarity Measurements

01:39:04 - Similarity Measurements

01:39:26 -

01:40:07 - Hierarchical clustering & heatmap

01:40:49 - Principal Component Analysis (PCA)

01:40:50 - Hierarchical clustering & heatmap

01:41:01 - Principal Component Analysis (PCA)

01:42:35 - Visualizing PCA

01:43:27 - PCA - The Details

01:44:45 -

01:44:58 - PCA - The Details

01:45:09 -

01:45:32 - Principal Components Analysis on:

01:45:58 - Eigenfaces

01:46:19 - Widely used in metabolomics

01:47:15 - PCA Loadings Plot

01:47:30 - Scores & Loadings

01:47:35 - PCA Details/Advice

01:47:36 - Scores & Loadings

01:48:29 - PCA Details/Advice

01:49:18 - PCA Summary

01:49:51 - PLS-DA

01:50:50 - PCA vs. PLS-DA

01:51:16 - Use PLS-DA with Caution

01:51:38 - Cross Validations

01:52:17 - Common Splitting Strategies

01:52:40 - Components and Features

01:52:45 - Common Splitting Strategies

01:52:58 - Components and Features

01:54:47 - Permutation Tests

01:58:48 -

Previous

Tools and Databases for Bioinformatics Analysis and Visualization, Alla Karnovsky

Next

Some Cells Know How to Keep Their Secrets