Type of Credit: Elective
Credit(s)
Number of Students
This course provides a semester-long introduction to data analysis, probability and statistics, and data science. The purpose of this course is to help students learn the powerful tools in R for statistical analysis and data science. We will cover topics on probability and statistics, explanatory data analysis, data visualization, and machine learning. We plan to discuss core principles and a few methods of machine learning such as the bias-variance tradeoff, cross validation, loss functions and penalization, linear regression, logistic regression, and tree-based methods.
IMPORTANT NOTE:
能力項目說明
The objective of the course is to lay a foundation for analysis of real-world data, and to equip students with statistcial, computing and data related skills using R to answer important statistical questions.
教學週次Course Week | 彈性補充教學週次Flexible Supplemental Instruction Week | 彈性補充教學類別Flexible Supplemental Instruction Type |
---|---|---|
Tentative Course Schedule:
W1 02/19: Course Introduction & Mechanics; Getting Started with R Programming
W2 02/26: Getting Started with R Programming (cont'd)
W3 03/05: Introduction to tidyverse; R Markdown
W4 03/12: Data Transformation
W5 03/19: Data Aggregation; Data Merging & Reshaping
W6 03/26: Probability & Statistics in R
W7 04/02: No Class
W8 04/09: Midterm Exam
W9 04/16: LLN, CLT & Statistical Inference in R
W10 04/23: Machine Learning Introduction & Fundamental Concepts
W11 04/30: ML: Linear Regression & Logistic Regression
W12 05/07: ML: Decision Trees (CART)
W13 05/14: ML: Ensemble Methods
W14 05/21: No Class
W15 05/28: Data Visualization & String Manipulation
W16 06/04: Final Presentations of Group Data-Analysis Projects
W17 06/11: Writing Up the Term Paper (No Class)
W18 06/18: Self Study (No Class)
Class attendance & participation (10%)
Problem sets (15%): The problem sets will include both problem solving and computer tasks. The assignments will be reviewed in class if necessary. You are encouraged to form a study group with your classmates, but you must write up your own answers. Problem sets with identical answers will not be accepted.
Weekly in-class group practice (25%)
The data-analysis group project (25%), including the final presentation (10%) and the term paper (15%).
There is no texkbook for the course. Lecture slides, R scripts, and other class materials will be available on Moodle on a weekly basis.
The (optional) supplementary textbooks:
- Introduction to Data Science - Data Analysis and Prediction Algorithms with R, by Irizarry (CRC Press, 2020).
- Hands-On Programming with R, by Garrett Grolemund, (O'REILLY, 2014).
- R for Data Science - Import, Tidy, Transform, Visualize, and Model Data, by Wickham and Grolemund, 2017, O’Reilly Media, Inc.
- An Introduction to Statistical Learning - with Applications in R, by James, Witten, Hastie, and Tibshirani (Springer, 2013).
- Modern Data Science with R, by Baumer, Kaplan and Horton (CRC Press, 2017).