政大教學大綱

課程簡介Course Description

Natural language processing (NLP) is an attractive area in artifical intelligence. A wide range of technologies were investigated for enabling the machine to communicate with humans. With the progress of machine learning, the power of NLP has been shown in novel applications in the areas such as fintech, medical, and digital humanity.

This course will introduce the fundamental topics of NLP from lexicon, syntax, to pragmatics and show how to formulate the language tasks with machine learning. The most recent, cutting-edge approaches to NLP such as Transformer-based large language models (LLMs) are the focus of this course. Real world data will be used in the final project. The participants are expected to handle the textual data for solving real world problems. In addtion, the issues and the resources of Chinese language processing will also be introduced.

自然語言處理包含一系列技術，讓電腦具備語言理解與生成的能力。搭配深度學習等技術，如今自然語言處理在許多人工智慧應用中展現了強大的威力，並且進一步朝向更具有前膽性的議題發展，應用領域包含金融科技、醫學、數位人文等。

本課程為中文授課，擬從詞彙、句法、語用等不同層次介紹自然語言處理的技術進展，示範如何用機率與統計來解決語言的問題，以及最前沿的機器學習模型例如基於 Transformer 的大型語言模型（LLM），在自然言處理的應用。並且提供真實資料作為專題素材，從中練習掌握自然語言處理的技巧與實務。此外，本課程會特別探討中文處理的問題，介紹如何運用最新的技術來分析中文文件。

核心能力分析圖 Core Competence Analysis Chart

能力項目說明

課程目標與學習成效Course Objectives & Learning Outcomes

Understand the fundamental topics from lexicon, syntax, to pragmatics in natural language processing and computational linguistics.
Be familiar with basic NLP like Chinese word segmentation, part-of-speech tagging, syntatic parsing, and dependency parsing.
Learn to build machine learning models with linguistic features for NLP.
Be aware of the most recent progress in NLP such as sentiment analysis, discourse analysis, question answering, and so on.
Learn to communicate with large language models such as ChatGPT effectively.

了解詞彙、句法、語用等不同層面的計算語言學議題
學習中文斷詞、詞性標記、句法剖析、相依性分析的原理與應用情境
熟悉如何使用機器學習模型，搭配適當的語言特徵來分析文件
認識意見分析、語篇分析、自動問答等進階議題的最新技術
了解大型語言模型如 ChatGPT 的優點與限制，並善用於應用中。

每周課程進度與作業要求 Course Schedule & Requirements

教學週次Course Week	彈性補充教學週次Flexible Supplemental Instruction Week	彈性補充教學類別Flexible Supplemental Instruction Type
16+2週16+2 weeks	第 7 週 Week 7	自主總整學習Capstone self-learning
16+2週16+2 weeks	第 18 週 Week 18	自主總整學習Capstone self-learning

week	Subject	In-class Activies & Hours	After Class Activies & Hours
1	Introduction to NLP An overview of natural language processing	Lecture: 3 hours	Post-lecture review: 3 hours
2	Linguistic Essentials A brief introduction of linguistics and its applications in NLP	Lecture: 3 hours	Post-lecture review: 3 hours
3	Collocation and Topic Modeling Mining collocated words from a collection of documents and clustering the documents into a number of groups.	Lecture: 3 hours	Post-lecture review: 3 hours
4	Language Modeling From the statistical n-gram model to deep-earning based language models are discussed. Basic models for training word embeddings including CBOW, Skipgram, and Fasttext, will be given. The self-supervised approaches to large language modeling are also included.	Lecture: 3 hours	Post-lecture review: 3 hours
5	Wod Sense Disambiguation Two approaches to word sense disambiguation.	Lecture: 3 hours	Post-lecture review: 3 hours
6	Text Classification Basic statistical models for text classification and feature extraction.	Lecture: 3 hours	Post-lecture review: 3 hours Assignment: 6 hours
7	兒童節/民族掃墓節
8	POS Tagging Introduction to sequence labeling and its important application in NLP including part-of-speech tagging. POS tagging in both Chinese and English will be described.	Lecture: 3 hours	Post-lecture review: 3 hours Assignment: 6 hours
9	Midterm Exam	Exam: 3 hours
10	Chinese Word Segmentation Chinese word segmentation is a special subject in NLP. How to perform text segmentation with sequence labeling will be introduced. Final project announcement	Lecture: 3 hours	Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hoursrs
11	Neural Networks for NLP Deep nueral networks play an important role in modern NLP. The most recent methodology, pre-trained Transformer-based langauge models such as BERT, T5, and GPT, which are very powerful and widely-used in almost all NLP tasks, will be covered.	Lecture: 3 hours	Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours
12	Advanced Neural Networks for NLP Other architectures such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) will also be introduced with their applications in NLP.	Lecture: 3 hours	Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours
13	Parsing Parse tree provides rich information in natural language understanding. This subject introduces two basic parsing schemes and computational models for parsing.	Lecture: 3 hours	Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours
14	Discourse Analysis Many novel applications are based on discourse analysis. This subject introduces discourse relation recognition and discourse parsing. Other topics in discourse analysis will be briefly described.	Lecture: 3 hours	Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours
15	Semi-supervised Approaches to NLP Semi-supervised learning is extremely useful in NLP because training data is usually insufficient in novel tasks. The strategies for training models in the semi-supervised fashion will be introduced.	Lecture: 3 hours	Post-lecture review: 3 hours Final project: 6 to N hours
16	Semi-supervised Approaches to NLP Semi-supervised learning is extremely useful in NLP because training data is usually insufficient in novel tasks. The strategies for training models in the semi-supervised fashion will be introduced.	Lecture: 3 hours	Post-lecture review: 3 hours Assignment: 6 hours Final project: 6 to N hours
17	Term Exam	EXAM: 3 hours
18	自主學習

授課方式Teaching Approach

70%

講述 Lecture

討論 Discussion

10%

小組活動 Group activity

數位學習 E-learning

20%

其他： Others: Codelab

評量工具與策略、評分標準成效Evaluation Criteria

期中考、期末考以現場筆試進行，出題方向包含課堂所授之技術與觀念，以及活用技術解決實際問題情境。

專題將挑選具有前瞻性與實用性的題目，提供開發資料集，以組隊類似 Kaggle 形式進行，為期一至兩個月。評量標準依效能、名次、方法的創新性為主。

Midterm exam: 20%

Term exam: 30%

Term project：30%

Assignments: 20%

指定/參考書目Textbook & References

Yoav Goldberg, Neural Network Methods in Natural Language Processing, Morgan & Claypool Publishers. 2017.

Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. 1999.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.

教學大綱 Syllabus

科目名稱：自然語言處理

Course Name: Natural Language Processing

修別：選

學分數

預收人數

課程簡介Course Description

核心能力分析圖 Core Competence Analysis Chart

課程目標與學習成效Course Objectives & Learning Outcomes

每周課程進度與作業要求 Course Schedule & Requirements

授課方式Teaching Approach

講述 Lecture

討論 Discussion

小組活動 Group activity

數位學習 E-learning

其他： Others: Codelab

評量工具與策略、評分標準成效Evaluation Criteria

指定/參考書目Textbook & References

已申請之圖書館指定參考書目圖書館指定參考書查詢 |相關處理要點

課程相關連結Course Related Links

課程附件Course Attachments

課程進行中，使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

教學大綱 Syllabus

科目名稱：自然語言處理

Course Name: Natural Language Processing

修別：選

學分數

預收人數

課程簡介Course Description

核心能力分析圖 Core Competence Analysis Chart

課程目標與學習成效Course Objectives & Learning Outcomes

每周課程進度與作業要求 Course Schedule & Requirements

授課方式Teaching Approach

講述 Lecture

討論 Discussion

小組活動 Group activity

數位學習 E-learning

其他： Others: Codelab

評量工具與策略、評分標準成效Evaluation Criteria

指定/參考書目Textbook & References

已申請之圖書館指定參考書目 圖書館指定參考書查詢 |相關處理要點

課程相關連結Course Related Links

課程附件Course Attachments

課程進行中，使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

已申請之圖書館指定參考書目圖書館指定參考書查詢 |相關處理要點