教學大綱 Syllabus

科目名稱:資料科學

Course Name: Data Science

修別:群

Type of Credit: Partially Required

3.0

學分數

Credit(s)

30

預收人數

Number of Students

課程資料Course Details

課程簡介Course Description

Artificial Intelligence (AI) refers to the intelligence demonstrated by machines, in contrast to the natural intelligence displayed by animals, including humans. Initially, computers were primarily used for numerical calculations, leading to the development of applications that supported routine tasks, such as retrieving news articles from the internet. However, achieving AI requires a substantial amount of data and precise handling of various details and issues. A notable project in the field of AI is ChatGPT, where the objective is to develop an advanced language model capable of generating human-like text. Through extensive training on diverse datasets, ChatGPT leverages deep learning techniques to comprehend context and produce coherent responses, making it a powerful tool for natural language processing tasks. In GPT series, GPT-3 is trained on a massive dataset of text and code, including text from the internet, books, code repositories, and other sources. The exact composition of the dataset is not publicly known, but it is estimated to be over 500 gigabytes in size.

The course covers various topics in data science. It includes an introduction to data, computer vision (CV) concepts such as semantic segmentation, image classification, and object detection. Additionally, it covers natural language processing (NLP) areas like language modeling, question answering, machine translation, sentiment analysis, and text generation. The course also delves into time series analysis, covering anomaly detection and time series forecasting, as well as speech-related topics like speech recognition and speech synthesis.

核心能力分析圖 Core Competence Analysis Chart

能力項目說明


    課程目標與學習成效Course Objectives & Learning Outcomes

    The aim of this course is to cultivate an understanding among participants about the importance, types, and capabilities of data. For instance, we will delve into processing natural language and explore various tasks in this domain that can be performed by computers, such as question-answering, machine translation, and sentiment analysis. As a project, participants will have the opportunity to apply their knowledge in real-world scenarios and create practical solutions using the concepts learned throughout the course.

    每周課程進度與作業要求 Course Schedule & Requirements

    教學週次Course Week 彈性補充教學週次Flexible Supplemental Instruction Week 彈性補充教學類別Flexible Supplemental Instruction Type

    週次

    課程主題

    課程內容與指定閱讀

    教學活動與作業

    1

    Introduction to data & ChatGPT
    (09/10)

    Self-made teaching materials

    Lecture

    2

    Moon Festival
    (09/17)

       

    3

    What is the problem on Data
    (09/24)

    Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.

    • Lecture
    • Practice: clustering (K-Means, KNN) segmentation
    • Homework: semantic segmentation.

    4

    CV - Introduction to Image Classification
    (10/01)

    • Lecture
    • Practice: SVM classification
    • Homework: Parameter tuning.
    5 CV - Introduction to Image Segmentation
    (10/08)

    6

    CV - Image Generation
    (10/15)

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).

    • Lecture
    • Practice: Stable diffusion & Midjourney
    • Homework: Training a classification model by using AI generated content, & performance comparison

    7

    NLP - Introduction to Language Modeling (1)
    (10/22)

    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.

    • Lecture
    • Practice: Word Vector & classification

    8

    NLP - Introduction to Language Modeling (2)
    (10/29)

    • Lecture
    • Practice: Text processing
    • Homework: Semantic analysis

    9

    NLP - Introduction to Text Generation
    (11/05)

    • Lecture
    • Practice: Markov Chain, GPT-2
    • Homework: Compare Markov Chain with ChatGPT

    10

    NLP – Other tasks (1)
    (11/12)

    Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

    • Lecture
    • Practice: Machine reading comprehension & sentence classification by using HuggingFace
    • Homework: sentence classification

    11

    NLP – Other tasks (2)
    (11/19)

    • Lecture
    • Practice: Machine reading comprehension & question-answering by using HuggingFace
    • Homework: question answering

    12

    NLP Workshop
    (11/26)

     

    • Flexible Week
    • TA lead CSCL workshop about Classification

    13

    NLP Workshop
    (12/03)

     

    • Lecturer lead CSCL workshop about Evaluation

    14

    NLP - Pretraining Model

    Self-made teaching materials

    • Lecture
    • Practice: RNN
    • Homework: stock price prediction, performance comparision

    15

    In-class speech
    (12/17)

    (Tentative) Mr. Veeresh Ittangihal, Data Scientist in Micron Technology - Data Scientist in Semiconductor Industry

    16

    Data Collection
    (12/24)

     

    • Lecture
    • Practice: Crawler

    17

    Final Project
    (12/31)

     

    Flexible week, Implement your project

    18

    Final Project
    (01/07)

     

    Final Project Presentation

    授課方式Teaching Approach

    30%

    講述 Lecture

    20%

    討論 Discussion

    50%

    小組活動 Group activity

    0%

    數位學習 E-learning

    0%

    其他: Others:

    評量工具與策略、評分標準成效Evaluation Criteria

    1. Attendance (10%): This course will be an in-person class, you have to come to the classroom every week. Based on school epidemic prevention policies, we would switch to online classes (Google Meets) if the COVID-19 pandemic outbreak occurs again.
    2. Homework (40%): Homework will be assigned almost every week, and you (or your team) should submit it to the learning management system (Google Classroom) on time. This course won't accept any delayed submissions.
    3. Final Project (50%): We are going to publish one data set on the Internet, and you (or your team) have to collect data, describe data characteristics, explain tasks, and give a simple demo on your data in order to prove the data quality.

    Generative AI is warmly welcomed in classroom settings, offering innovative tools and methods to enhance teaching and learning experiences. Its application in education encourages interactive and personalized learning approaches.

    指定/參考書目Textbook & References

    1. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
    2. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    3. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.
    4. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
    5. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).
    6. Self-made teaching materials

    已申請之圖書館指定參考書目 圖書館指定參考書查詢 |相關處理要點

    書名 Book Title 作者 Author 出版年 Publish Year 出版者 Publisher ISBN 館藏來源* 備註 Note

    維護智慧財產權,務必使用正版書籍。 Respect Copyright.

    課程相關連結Course Related Links

    
                

    課程附件Course Attachments

    課程進行中,使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

    Yes

    列印