教學大綱 Syllabus

科目名稱:資料科學

Course Name: Data Science

修別:選

Type of Credit: Elective

3.0

學分數

Credit(s)

20

預收人數

Number of Students

課程資料Course Details

課程簡介Course Description

This course aims to introduce data science from a pragmatic, practice-oriented viewpoint. Students will learn concepts, R programming language, and tools they need to deal with various facets of data science practice, including data integration, exploratory data analysis, predictive modeling, evaluation, and effective visual communication. By the end of the course, they will be able to apply data science techniques to their research topics.

[NOTICE]
Homework is a programming exercise. 
Therefore, you should have coding experience before. The course SHOULD NOT BE your first programming course.

核心能力分析圖 Core Competence Analysis Chart

能力項目說明


    課程目標與學習成效Course Objectives & Learning Outcomes

    Data science is an interdisciplinary and emerging field that studies the generalizable extraction of knowledge from data. Being a data scientist requires an integrated skill set, including statistics, machine learning, data mining, and big data analytics. This course will introduce students to this rapidly growing topic and equip them with some of its fundamental principles, R programming skills, and useful tools. Central threads include introduction (weeks 1~2), defining goal (weeks 3~4), managing data (weeks 5~6), visualizing data (weeks 7~9), and modeling (week11~16). Real cases from various disciplines will be used to make the learning contextual. For getting students' hands-on implementation, there will be a set of assignments (about 4 to 6) and a final project. Each assignment is designed as an individual step of the whole data science process such that students can build their final project based on the code of those assignments. Besides the assignments and the project, there will be frequent opportunities for in-class programming exercises.

    每周課程進度與作業要求 Course Schedule & Requirements

    教學週次Course Week 彈性補充教學週次Flexible Supplemental Instruction Week 彈性補充教學類別Flexible Supplemental Instruction Type

    Week01
    Introduction

    What is data science? big data? deep learning?
    Three components: data, modeling, evaluation
    Data science platforms
    • why choose the R programming language?
    • integrated development environment for R: RStudio
    ​Supporting Materials
    1. Chapters 1, 2, appendix A​
        ​Number of hours invested per week = 6 hours

    Week02
    Documentation and deployment of your code

    Version control system by Github
    ​Supporting Materials
    1. Chapters 10, 11
        ​Number of hours invested per week = 6 hours

    Week03
    How to evaluate output?

    Specificity, sensitivity, recall, F-score
    Receiver operating characteristic curve, AUC
    Statistical significance: p-value, the false discovery rate
    ​Supporting Materials
    1. Chapter 5
    2. ROCR package - Visualizing classifier performance in R
        ​Number of hours invested per week = 6 hours

    Week04
    How to perform an evaluation?

    Cross-validation
    Bootstrap and jackknife sampling
    Bias, variance, overfitting
    ​Supporting Materials​
    1. ​Chapter 6.2
        ​Number of hours invested per week = 6 hours

    Week05
    Feature selection/extraction/reduction

    Principal component analysis (PCA), correspondence analysis (CA)
    Probabilistic latent semantic analysis
    • maximum likelihood estimation
    • expectation-maximization algorithm
    Supporting Materials
    1. A tutorial on principal component analysis by Jonathon Shlens
    2. Correspondence Analysis and Related Methods by Michael Greenacre
    3. Multivariate statistics by Michael Greenacre
        ​Number of hours invested per week = 6 hours

    Week06
    Exploring/managing data

    Probabilistic and ideal-data models
    Character/parsimony-based method
    s
    ​Supporting Materials
    1. Chapters 3, 4​
        ​Number of hours invested per week = 6 hours

    Week07
    School holiday


    Week08
    Visualization (1/2)
    charts, graphs, networks, maps
    ​Interactive visualizations - Shiny app

    Supporting Materials
    1. Simple Graphs with R
    2. Basic Graphs by Quick R
        ​Number of hours invested per week = 6 hours

    Week09
    Visualization (2/2)

    Workflow: scripts
    Exploratory Data Analysis
    Workflow: projects Data import

    Supporting Materials
    • R for Data Science
      1. Cha 6. Workflow: scripts
      2. Cha 7. Exploratory Data Analysis
      3. Cha 8. Workflow: projects
      4. Cha 11. Data import
        ​Number of hours invested per week = 18 hours

    Week10
    Midterm

    Closed book except for one A4 notes

     

    Week11
    Unsupervised learning

    Clustering analysis
    Association rule

    Supporting Materials
    1. Chapter 6, 8
        ​Number of hours invested per week = 6 hours

    Week12
    Supervised learning - 1

    Memorization methods​
    Supporting Materials
    1. Chapter 6
        ​Number of hours invested per week = 6 hours

    Week13
    Supervised learning - 2

    Logistic/Linear regression
    ​Supporting Materials
    1. PSDR: Chapter 7.1
    2. ISLR: Chapter 3
    3. PSDR: Chapter 7.2
    4. ISLR: Chapter 4
        ​Number of hours invested per week = 6 hours

    Week14
    Supervised learning - 3

    Generalized Additive Models
    ​Supporting Materials
    1. PSDR: Chapter 9.1​
    2. ISLR: Chapter 7
        ​Number of hours invested per week = 6 hours

    Week15
    Supervised learning - 4

    Decision Tree & Random Forest & XGBoost
    Supporting Materials
    1. PSDR: Chapter 9.1​​
    2. ISLR: Chapter 8
        ​Number of hours invested per week = 6 hours

    Week16
    Supervised learning - 5

    Support Vector Machine
    Supporting Materials
    1. PSDR: Chapter 9.3, 9.4
    2. ISLR: Chapter 9
    3. Support vector machines and kernel methods: status and challenges by Chih-Jen Lin
    4. Talks about Machine Learning by Chih-Jen Lin
        ​Number of hours invested per week = 6 hours
     

    Week17
    Final project presentation
    Rubrics/評分量尺
     

    Week18

    Flexibility Week 彈性活動週​

    TBA   
    Supporting Materials
    1. PSDR: Chapter 9.3, 9.4
    2. ISLR: Chapter 9
    3. Support vector machines and kernel methods: status and challenges by Chih-Jen Lin
    4. Talks about Machine Learning by Chih-Jen Lin
        ​Number of hours invested per week = 6 hours

    授課方式Teaching Approach

    60%

    講述 Lecture

    0%

    討論 Discussion

    0%

    小組活動 Group activity

    40%

    數位學習 E-learning

    0%

    其他: Others:

    評量工具與策略、評分標準成效Evaluation Criteria

    • 50% 作業/Homework
    • 15% 線上測驗/Zuvio online test
    • 15% 期中考/Midterm
    • 20% 期末專題/Final project
    • <=5% 上課表現(加分)/Attendance & Participation (bonus)

    指定/參考書目Textbook & References

    • 指定
    1. PSDRPractical Data Science with R. by Zumel, N. & Mount, J.  (Manning, 2019).  ISBN 9781617295874
    2. ISLRAn Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
    3. R for Data Science: Visualize, Model, Transform, Tidy, and Import Data, by Hadley Wickham  & Garrett Grolemund (1st Edition) 
    • 其他參考資料
    1. ​How to Measure Anything Workbook: Finding the Value of Intangibles in Business
    2. ​Additional material (Credit by Thomas M. Carsey, carsey@unc.edu)
    3. Data Mining with R: Learning with Case Studies, by Torgo, http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/
    4. An Introduction to Data Science, Version 3, by Stanton, http://jsresearch.net/
    5. Machine Learning with R by Lantz, http://www.packtpub.com/machine-learning-with-r/book
    6. A Simple Introduction to Data Science, by Burlingame and Nielsen, http://newstreetcommunications.com/businesstechnical/a_simple_introduction_to_data_science
    7. Ethics of Big Data, by Davis, http://shop.oreilly.com/product/0636920021872.do
    8. Privacy and Big Data, by Craig and Ludloff, http://shop.oreilly.com/product/0636920020103.do
    9. Doing Data Science: Straight Talk from the Frontline, by O’Neil and Schutt, http://shop.oreilly.com/product/0636920028529.do
    10. Springer Textbooks Use R! Series, http://www.springer.com/series/6991
    11. Online search tool Rseek, http://www.rseek.org/
    12. ​The Odum Institute’s online course, http://www.odum.unc.edu/odum/contentSubpage.jsp?nodeid=670

    已申請之圖書館指定參考書目 圖書館指定參考書查詢 |相關處理要點

    維護智慧財產權,務必使用正版書籍。 Respect Copyright.

    課程相關連結Course Related Links

    https://www.changlabtw.com/1122-datascience.html
    

    課程附件Course Attachments

    課程進行中,使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

    需經教師同意始得使用 Approval

    列印