教學大綱 Syllabus

科目名稱:語料處理

Course Name: Corpus Processing

修別:選

Type of Credit: Elective

3.0

學分數

Credit(s)

20

預收人數

Number of Students

課程資料Course Details

課程簡介Course Description

  • 開學前兩週將配合學校防疫規定實施遠距教學,課程連結如下: 
    https://gather.town/invite?token=MpHZEi5B
  • 有意修課或是旁聽者,請於第一週出席課堂以便了解課程相關內容。

This course targets students who are interested in processing dataset automatically with programming language. The goal is to let students know how to write scripts with released packages to help further analyze data linguistically with basic knowledge in data processing with machine learning models. In this course, we'll handle text data with R (which is one of the prevalent programming languages nowadays). Research articles related to current linguistic issues and application approaches are assigned, and we will have a deeper discussion of these articles during the class. There will also be one mini-hackathon held after mid-term week to help students integrate all the skills they learned during the course. At the end of this course, students will need to do a final presentation and submit a final term paper.

 

核心能力分析圖 Core Competence Analysis Chart

能力項目說明


    課程目標與學習成效Course Objectives & Learning Outcomes


    It is expected that by the end of the course, students are able to write their own scripts to process text data in their own research projects. They will also learn basic procedures in machine learning applications.

     

    每周課程進度與作業要求 Course Schedule & Requirements

    *Subject to change

    Week

    Topic

    Note

    1

    Course Introduction

     

    2

    Data preprocessing: advance search and replacement

     

    3

    Word segmentation and stopwords

    reading

    4

    Function and Data Structure: matrix, dataframe, and list

     

    5

    Loop and if/else Statement

     

    6

    Data Visualization

     

    7

    Feature extraction: bag of words and tf-idf

    reading

    8

    Machine learning: decision tree and random forest

    reading

    9

    Machine learning: support vector machine

    reading

    10

    mini-Hackathon

     

    11

    mini-Hackathon Presentation

     

    12

    Machine learning: logistic regression

    reading

    13

    Opinion mining: crawler

    reading

    14

    Rmarkdown and Shiny

    reading

    15

    Proposal Discussion

     

    16

    R & API

     

    17

    Final Presentation

     

    18

    Term Paper Due

     

     

    授課方式Teaching Approach

    35%

    講述 Lecture

    20%

    討論 Discussion

    20%

    小組活動 Group activity

    20%

    數位學習 E-learning

    5%

    其他: Others:

    評量工具與策略、評分標準成效Evaluation Criteria

    (暫定)

    課程參與率30%: 包含出席率、課堂討論、小組討論參與率

    課後作業 30%: 課後作業完成度

    微黑客松 20%: 整體成果完成度及個人參與率

    個人報告 20%: 個人期末成果呈現完成度

     

    *嚴格禁止抄襲

    *無故缺席不得超過三次

    指定/參考書目Textbook & References

    Cotton, Richard. (2013). Learning R. O’Reilly.

    Teetor, Paul. (2011). R Cookbook. O’Reilly.

    已申請之圖書館指定參考書目 圖書館指定參考書查詢 |相關處理要點

    維護智慧財產權,務必使用正版書籍。 Respect Copyright.

    課程相關連結Course Related Links

    Learning R programming. https://www.tutorialspoint.com/r/r_variables.htm
    R Tutorial for Beginners: Learning R Programming. https://www.guru99.com/r-tutorial.html
    Learn R Programming. https://www.datamentor.io/r-programming/
    R Tutorial – Outstanding Introduction to R Programming for Data Science! https://data-flair.training/blogs/r-tutorial/
    DataCamp. https://www.datacamp.com/
    
    

    課程附件Course Attachments

    課程進行中,使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

    Yes

    列印