教學大綱 Syllabus

科目名稱:語料處理

Course Name: Corpus Processing

修別:選

Type of Credit: Elective

3.0

學分數

Credit(s)

20

預收人數

Number of Students

課程資料Course Details

課程簡介Course Description

This course targets students who are interested in processing dataset automatically with programming language. The goal is to let students know how to write scripts with released packages to help further analyze data linguistically with basic knowledge in data processing with basic statistics and models. In this course, we'll handle text data with R (which is one of the prevalent programming languages nowadays). Research articles related to current linguistic issues and application approaches are assigned, and we will have a deeper discussion of these articles during the class. At the end of this course, students will need to do a final presentation and submit a final term paper.  

*每堂課程自備筆電

核心能力分析圖 Core Competence Analysis Chart

能力項目說明


    課程目標與學習成效Course Objectives & Learning Outcomes

    It is expected that by the end of the course, students are able to write their own scripts to process text data in their own research projects. They will also learn basic procedures in machine learning applications.

    *每堂課程自備筆電

    每周課程進度與作業要求 Course Schedule & Requirements

    教學週次Course Week 彈性補充教學週次Flexible Supplemental Instruction Week 彈性補充教學類別Flexible Supplemental Instruction Type

    *Subject to change (課程內容暫定,每週請自備筆電至課堂)

    Week

    Topic

    Note

    1

    Course Introduction

     

    2

    Data preprocessing

     

    3

    Basic Knowledge of Stats

    Word segmentation and stopwords

    4

    Descriptive Stats

    Function and Data Structure: matrix, dataframe, and list

    5

    Analytical Stats I

    Loop and if/else Statement  

    6

    Analytical Stats II

    Data Visualization 

    7

    Linear Model

    Feature extraction: bag of words and tf-idf

    8

    Logistic Regression

    Machine learning: decision tree and random forest

    9

    Mixed Model and HCA

    Machine learning: support vector machine

    10

    Opinion mining: crawler

     

    11

    Collostructional Analysis

     Rmarkdown and Shiny

    12

    R & API

     

    13

    Proposal Discussion

     

    14

    Final Presentation

     

    15

    Final Presentation

     

    16

    Term Paper Due

     

     

    授課方式Teaching Approach

    35%

    講述 Lecture

    20%

    討論 Discussion

    20%

    小組活動 Group activity

    20%

    數位學習 E-learning

    5%

    其他: Others:

    評量工具與策略、評分標準成效Evaluation Criteria

    (暫定)

    課程參與率 20 %: 包含出席率、課堂討論、小組討論參與率

    課後作業 20%: 課後作業完成度

    課堂報告 20%: 整體成果完成度及個人參與率

    期末口頭報告 20%: 期末口頭報告

    期末書面報告 20%: 個人期末成果呈現完成度

     

    *嚴格禁止抄襲

    *無故缺席不得超過三次

    指定/參考書目Textbook & References

    Cotton, Richard. (2013). Learning R. O’Reilly.

    Teetor, Paul. (2011). R Cookbook. O’Reilly.

    Stefan Th. Gries. (2013). Statistics for Linguistics with R: A practical introduction. (2nd ed.).

    Anatol Stefanowitsch and Stefan Th. Gries. (2005). Covarying collexemes.

    已申請之圖書館指定參考書目 圖書館指定參考書查詢 |相關處理要點

    維護智慧財產權,務必使用正版書籍。 Respect Copyright.

    課程相關連結Course Related Links

    Learning R programming. https://www.tutorialspoint.com/r/r_variables.htm
    R Tutorial for Beginners: Learning R Programming. https://www.guru99.com/r-tutorial.html
    Learn R Programming. https://www.datamentor.io/r-programming/
    R Tutorial – Outstanding Introduction to R Programming for Data Science! https://data-flair.training/blogs/r-tutorial/
    DataCamp. https://www.datacamp.com/
    
    

    課程附件Course Attachments

    課程進行中,使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

    Yes

    列印