教學大綱 Syllabus

科目名稱:自然語言處理

Course Name: Natural Language Processing

修別:選

Type of Credit: Elective

3.0

學分數

Credit(s)

30

預收人數

Number of Students

課程資料Course Details

課程簡介Course Description

Natural language processing (NLP) is an attractive area in artifical intelligence. A wide range of technologies were investigated for enabling the machine to communicate with humans. With the progress of machine learning, the power of NLP has been shown in novel applications in the areas such as fintech, medical, and digital humanity. 

This course will introduce the fundamental topics of NLP from lexicon, syntax, to pragmatics and show how to formulate the language tasks with machine learning. The most recent, cutting-edge approaches to NLP such as Transformer-based large language models (LLMs) are the focus of this course. Real world data will be used in the final project. The participants are expected to handle the textual data for solving real world problems. In addtion, the issues and the resources of Chinese language processing will also be introduced. 


自然語言處理包含一系列技術,讓電腦具備語言理解與生成的能力。搭配深度學習等技術,如今自然語言處理在許多人工智慧應用中展現了強大的威力,並且進一步朝向更具有前膽性的議題發展,應用領域包含金融科技、醫學、數位人文等。

本課程為中文授課,擬從詞彙、句法、語用等不同層次介紹自然語言處理的技術進展,示範如何用機率與統計來解決語言的問題,以及最前沿的機器學習模型例如基於 Transformer 的大型語言模型(LLM),在自然言處理的應用。並且提供真實資料作為專題素材,從中練習掌握自然語言處理的技巧與實務。此外,本課程會特別探討中文處理的問題,介紹如何運用最新的技術來分析中文文件。

核心能力分析圖 Core Competence Analysis Chart

能力項目說明


    課程目標與學習成效Course Objectives & Learning Outcomes

    1. Understand the fundamental topics from lexicon, syntax, to pragmatics in natural language processing and computational linguistics.
    2. Be familiar with basic NLP like Chinese word segmentation, part-of-speech tagging, syntatic parsing, and dependency parsing.
    3. Learn to build machine learning models with linguistic features for NLP. 
    4. Be aware of the most recent progress in NLP such as sentiment analysis, discourse analysis, question answering, and so on. 
    5. Learn to communicate with large language models such as ChatGPT effectively. 

     

    1. 了解詞彙、句法、語用等不同層面的計算語言學議題
    2. 學習中文斷詞、詞性標記、句法剖析、相依性分析的原理與應用情境
    3. 熟悉如何使用機器學習模型,搭配適當的語言特徵來分析文件
    4. 認識意見分析、語篇分析、自動問答等進階議題的最新技術
    5. 了解大型語言模型如 ChatGPT 的優點與限制,並善用於應用中。

    每周課程進度與作業要求 Course Schedule & Requirements

    教學週次Course Week 彈性補充教學週次Flexible Supplemental Instruction Week 彈性補充教學類別Flexible Supplemental Instruction Type

     

    week

    Subject

    In-class Activies & Hours

    After Class Activies & Hours

    1

    Introduction to NLP

    An overview of natural language processing

    Lecture: 3 hours

    Post-lecture review: 3 hours

    2

    Linguistic Essentials

    A brief introduction of linguistics and its applications in NLP

    Lecture: 3 hours

    Post-lecture review: 3 hours

    3

    Collocation and Topic Modeling

    Mining collocated words from a collection of documents and clustering the documents into a number of groups.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    4

    Language Modeling

    From the statistical n-gram model to deep-earning based language models are discussed. Basic models for training word embeddings including CBOW, Skipgram, and Fasttext, will be given. The self-supervised approaches to large language modeling are also included.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    5

    Wod Sense Disambiguation

    Two approaches to word sense disambiguation.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    6

    Text Classification

    Basic statistical models for text classification and feature extraction.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Assignment: 6 hours

    7

    兒童節/民族掃墓節

     

     

    8

    POS Tagging

    Introduction to sequence labeling and its important application in NLP including part-of-speech tagging. POS tagging in both Chinese and English will be described.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Assignment: 6 hours

    9

    Midterm Exam

    Exam: 3 hours

     

    10

    Chinese Word Segmentation

    Chinese word segmentation is a special subject in NLP. How to perform text segmentation with sequence labeling will be introduced.

    Final project announcement

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Assignment: 6 hours

    Final project: 6 to N hoursrs

    11

    Neural Networks for NLP

    Deep nueral networks play an important role in modern NLP. The most recent methodology, pre-trained Transformer-based langauge models such as BERT, T5, and GPT, which are very powerful and widely-used in almost all NLP tasks, will be covered.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Assignment: 6 hours

    Final project: 6 to N hours

    12

    Advanced Neural Networks for NLP

    Other architectures such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) will also be introduced with their applications in NLP.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Assignment: 6 hours

    Final project: 6 to N hours

    13

    Parsing

    Parse tree provides rich information in natural language understanding. This subject introduces two basic parsing schemes and computational models for parsing.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Assignment: 6 hours

    Final project: 6 to N hours

    14

    Discourse Analysis

    Many novel applications are based on discourse analysis. This subject introduces discourse relation recognition and discourse parsing. Other topics in discourse analysis will be briefly described.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Assignment: 6 hours

    Final project: 6 to N hours

    15

    Semi-supervised Approaches to NLP

    Semi-supervised learning is extremely useful in NLP because training data is usually insufficient in novel tasks. The strategies for training models in the semi-supervised fashion will be introduced.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Final project: 6 to N hours

    16

    Semi-supervised Approaches to NLP

    Semi-supervised learning is extremely useful in NLP because training data is usually insufficient in novel tasks. The strategies for training models in the semi-supervised fashion will be introduced.

    Lecture: 3 hours

    Post-lecture review: 3 hours

    Assignment: 6 hours

    Final project: 6 to N hours

    17

    Term Exam

    EXAM: 3 hours

     

    18

    自主學習

     

     

    授課方式Teaching Approach

    70%

    講述 Lecture

    0%

    討論 Discussion

    10%

    小組活動 Group activity

    0%

    數位學習 E-learning

    20%

    其他: Others: Codelab

    評量工具與策略、評分標準成效Evaluation Criteria

    期中考、期末考以現場筆試進行,出題方向包含課堂所授之技術與觀念,以及活用技術解決實際問題情境。

    專題將挑選具有前瞻性與實用性的題目,提供開發資料集,以組隊類似 Kaggle 形式進行,為期一至兩個月。評量標準依效能、名次、方法的創新性為主。

     

    Midterm exam: 20%

    Term exam: 30%

    Term project:30%

    Assignments: 20%
     

    指定/參考書目Textbook & References

    Yoav Goldberg, Neural Network Methods in Natural Language Processing, Morgan & Claypool Publishers. 2017.

    Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. 1999.

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.

     

    已申請之圖書館指定參考書目 圖書館指定參考書查詢 |相關處理要點

    維護智慧財產權,務必使用正版書籍。 Respect Copyright.

    課程相關連結Course Related Links

    
                

    課程附件Course Attachments

    課程進行中,使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

    Yes

    列印