Type of Credit: Elective
Credit(s)
Number of Students
Natural language processing (NLP) is an attractive area in artifical intelligence. A wide range of technologies were investigated for enabling the machine to communicate with humans. With the progress of machine learning, the power of NLP has been shown in novel applications in the areas such as fintech, medical, and digital humanity.
This course will introduce the fundamental topics of NLP from lexicon, syntax, to pragmatics and show how to formulate the language tasks with machine learning. The most recent, cutting-edge approaches to NLP such as Transformer-based large language models (LLMs) are the focus of this course. Real world data will be used in the final project. The participants are expected to handle the textual data for solving real world problems. In addtion, the issues and the resources of Chinese language processing will also be introduced.
自然語言處理包含一系列技術,讓電腦具備語言理解與生成的能力。搭配深度學習等技術,如今自然語言處理在許多人工智慧應用中展現了強大的威力,並且進一步朝向更具有前膽性的議題發展,應用領域包含金融科技、醫學、數位人文等。
本課程為中文授課,擬從詞彙、句法、語用等不同層次介紹自然語言處理的技術進展,示範如何用機率與統計來解決語言的問題,以及最前沿的機器學習模型例如基於 Transformer 的大型語言模型(LLM),在自然言處理的應用。並且提供真實資料作為專題素材,從中練習掌握自然語言處理的技巧與實務。此外,本課程會特別探討中文處理的問題,介紹如何運用最新的技術來分析中文文件。
能力項目說明
教學週次Course Week | 彈性補充教學週次Flexible Supplemental Instruction Week | 彈性補充教學類別Flexible Supplemental Instruction Type |
---|---|---|
week |
Subject |
In-class Activies & Hours |
After Class Activies & Hours |
1 |
Introduction to NLP An overview of natural language processing |
Lecture: 3 hours |
Post-lecture review: 3 hours |
2 |
Linguistic Essentials A brief introduction of linguistics and its applications in NLP |
Lecture: 3 hours |
Post-lecture review: 3 hours |
3 |
Collocation and Topic Modeling Mining collocated words from a collection of documents and clustering the documents into a number of groups. |
Lecture: 3 hours |
Post-lecture review: 3 hours |
4 |
Language Modeling From the statistical n-gram model to deep-earning based language models are discussed. Basic models for training word embeddings including CBOW, Skipgram, and Fasttext, will be given. The self-supervised approaches to large language modeling are also included. |
Lecture: 3 hours |
Post-lecture review: 3 hours |
5 |
Wod Sense Disambiguation Two approaches to word sense disambiguation. |
Lecture: 3 hours |
Post-lecture review: 3 hours |
6 |
Text Classification Basic statistical models for text classification and feature extraction. |
Lecture: 3 hours |
Post-lecture review: 3 hours Assignment: 6 hours |
7 |
兒童節/民族掃墓節 |
|
|
8 |
POS Tagging Introduction to sequence labeling and its important application in NLP including part-of-speech tagging. POS tagging in both Chinese and English will be described. |
Lecture: 3 hours |
Post-lecture review: 3 hours Assignment: 6 hours |
9 |
Midterm Exam |
Exam: 3 hours |
|
10 |
Chinese Word Segmentation Chinese word segmentation is a special subject in NLP. How to perform text segmentation with sequence labeling will be introduced. Final project announcement |
Lecture: 3 hours |
Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hoursrs |
11 |
Neural Networks for NLP Deep nueral networks play an important role in modern NLP. The most recent methodology, pre-trained Transformer-based langauge models such as BERT, T5, and GPT, which are very powerful and widely-used in almost all NLP tasks, will be covered. |
Lecture: 3 hours |
Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours |
12 |
Advanced Neural Networks for NLP Other architectures such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) will also be introduced with their applications in NLP. |
Lecture: 3 hours |
Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours |
13 |
Parsing Parse tree provides rich information in natural language understanding. This subject introduces two basic parsing schemes and computational models for parsing. |
Lecture: 3 hours |
Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours |
14 |
Discourse Analysis Many novel applications are based on discourse analysis. This subject introduces discourse relation recognition and discourse parsing. Other topics in discourse analysis will be briefly described. |
Lecture: 3 hours |
Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours |
15 |
Semi-supervised Approaches to NLP Semi-supervised learning is extremely useful in NLP because training data is usually insufficient in novel tasks. The strategies for training models in the semi-supervised fashion will be introduced. |
Lecture: 3 hours |
Post-lecture review: 3 hours Final project: 6 to N hours |
16 |
Semi-supervised Approaches to NLP Semi-supervised learning is extremely useful in NLP because training data is usually insufficient in novel tasks. The strategies for training models in the semi-supervised fashion will be introduced. |
Lecture: 3 hours |
Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours |
17 |
Term Exam |
EXAM: 3 hours |
|
18 |
自主學習 |
|
|
期中考、期末考以現場筆試進行,出題方向包含課堂所授之技術與觀念,以及活用技術解決實際問題情境。
專題將挑選具有前瞻性與實用性的題目,提供開發資料集,以組隊類似 Kaggle 形式進行,為期一至兩個月。評量標準依效能、名次、方法的創新性為主。
Midterm exam: 20%
Term exam: 30%
Term project:30%
Assignments: 20%
Yoav Goldberg, Neural Network Methods in Natural Language Processing, Morgan & Claypool Publishers. 2017.
Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. 1999.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.