教學大綱 Syllabus

科目名稱:計量語言學

Course Name: Quantitative Linguistics

修別:選

Type of Credit: Elective

3.0

學分數

Credit(s)

20

預收人數

Number of Students

課程資料Course Details

課程簡介Course Description

This course aims to guide graduate students in learning the fundamental quantitative methods that are frequently used in linguistic research. It is expected that students have learned fundamental techniques in programming, especially in R, since this course will not start from scratch introducing basic commands in R. The first two hours of the course will be mainly lecturing, and the last hour will be a lab session for students to get their hands ``dirty".

Students will learn why quantitative methods are important, and how to analyze large datasets quantitatively and explore the underlying linguistic cues systematically. In this course, students will learn not only essential quantitative methods, but also basic machine learning approaches with big data and how to evaluate models. This course will present and demonstrate how to use R in implementing the above approaches. Lastly, a final project needs to be submitted at the end of the semester with skills learned in the course.     

Furthermore, this course collaborates with the company Trend Micro to tackle cyber security related issues (e.g., the detection of false messages). Students will be provided with real time datasets in the course, and are encouraged to develop projects relevant (but not limited) to the topics. They are expected to provide linguistic insights to work on this interdisciplinary study.

 

核心能力分析圖 Core Competence Analysis Chart

能力項目說明


    課程目標與學習成效Course Objectives & Learning Outcomes

    • Understand the concepts and structures of the statistic models
    • Learn how to interpret the interactions of the features and the models
    • Know how these statistic models are implements onto machine learning models, and how to work on dimension deduction
    • Discuss the topics extended from false messages in the field of natural language processing, and explore how to tackle the issues from the linguistic perspective

    每周課程進度與作業要求 Course Schedule & Requirements

    TBA

     

    週次

    教學主題

    主要參考資料

    其他參考資料

    作業

    1

    Course Introduction & Why quantitative methods?

     

     

     

    2

    Descriptive Statistics & Probability Distri butions

    Baayen (2008) Ch2-3; Gries (2021) Ch3; 論文閱讀; 自製講義

     

    當週作業

    3

    Descriptive Statistics & Probability Distri butions

    Baayen (2008) Ch2-3; Gries (2021) Ch3; 論文閱讀; 自製講義

     

    當週作業

    4

    Test Statistics, Effect Size, Standardiza tion, and Regularization

    Baayen (2008) Ch4; Gries (2021) Ch4; 論文閱讀; 自製講義

     

    當週作業

    5

    Inter-annotator Agreements

    論文閱讀; 自製講義

     

    當週作業

    6

    Linear Regression and Correlation

    Baayen (2008) Ch6; Gries (2021) Ch5; 論文閱讀; 自製講義

     

    當週作業

    7

    Linear Mixed models

    Baayen (2008) Ch6-7; Gries (2021) Ch5; 論文閱讀; 自製講義

     

    當週作業

    8

    (mixed-effects) Logistic Regression

    Baayen (2008) Ch7; Gries (2021) Ch6; 論文閱讀; 自製講義

     

    當週作業

    9

    mini Hackathon

     

     

     

    10

    mini Hackathon - presentation

     

     

     

    11

    Principal Component Analysis and Factor Analysis

    Baayen (2008) Ch5; 論文閱讀; 自製講義

     

    當週作業

    12

    Multidimensional Scaling and Hierarchical Cluster Analysis

    Baayen (2008) Ch5; 論文閱讀; 自製講義

     

    當週作業

    13

    Discriminative Analysis and Nonlinearities

    論文閱讀; 自製講義

     

    期末專題準備

    14

    Decision Trees and Random Forests

    Gries (2021) Ch7; 論文閱讀; 自製講義

     

     

    15

    Embeddings and Latent Semantic Analysis

    論文閱讀; 自製講義

     

     

    16

    業界演講

     

     

    邀請趨勢科技公司前來分享人工智慧處理案例及假訊息相關議題。

    17

    Final Project Presentation

     

     

    期末專題展演

    18

    Term paper due

     

     

    期末專題論文繳交

     

     

    授課方式Teaching Approach

    30%

    講述 Lecture

    30%

    討論 Discussion

    20%

    小組活動 Group activity

    10%

    數位學習 E-learning

    10%

    其他: Others: 每週作業及報告

    評量工具與策略、評分標準成效Evaluation Criteria

    TBA

    • Class attendance (30\%): students are expected to show up in classes and participate in-class discussions and activities
    • Weekly reading presentation (30\%): each participant should make at least two presentations from the weekly reading list.
    • Midterm proposal (10\%): a written proposal of the final project should be handed in on week 9. The proposal should be restricted in two pages, including introduction, literature reviews, and methodology.
    • Final Projects (30\%): students will be asked to complete a final project that demonstrates their grasp of the course lectures. Each project will be given up to 15 mins presentation on week 17, and a final written paper needs to be hand-in on week 18.

    指定/參考書目Textbook & References

    TBA

      1. Guillaume Desagulier. (2017). Corpus Linguistics and Statistics with R. Springer International Publishing.
      2. Garrett Grolemund and Hadley Wickham. (2017). R for Data Science. O’Reilly. Available at: https://r4ds.had.co.nz/.
      3. Stefan Gries. (2016). Quantitative Corpus Linguistics with R: a practical introduction (2nd edition). Routledge.
      4. Stefan Gries. (2021). Statistics for Linguistics with R: a practical introduction. Walter de Gruyter.
      5. Harald Baayen. (2008). Analyzing Linguistic Data: a practical introduction to statistics using R. Cambridge University Press.
      6. Keith Johnson. (2008). Quantitative Methods in Linguistics. Wiley-Blackwell, Malden, MA.

    已申請之圖書館指定參考書目 圖書館指定參考書查詢 |相關處理要點

    維護智慧財產權,務必使用正版書籍。 Respect Copyright.

    課程相關連結Course Related Links

    
                

    課程附件Course Attachments

    課程進行中,使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

    Yes

    列印