教學大綱 Syllabus

科目名稱:網路安全的資料科學

Course Name: Data Science for Cybersecurity

修別:選

Type of Credit: Elective

3.0

學分數

Credit(s)

15

預收人數

Number of Students

課程資料Course Details

課程簡介Course Description

Please see the following link to see the updated information.

https://sites.google.com/view/mikehsiao/teaching/data-science-for-cybersecurity-2024

This course serves as an introductory triggering class for students who are interested in cybersecurity analysis using machine learning methods. Students should get familiar with tools, algorithms, concepts, and the execution environment to perform data analysis on cybersecurity data. Students need to learn to be architects to solve security-related problems using data analysis algorithms and tools. Related security concepts, data analysis theories, research papers, and background knowledge will be covered in the class. We will introduce several security systems that implement data analysis algorithms to achieve their security goals.

Note that students should take programming courses before, such as Programming Language I/II. The programming language used in this class is Python (however we will NOT cover any Python language tutorial), and we will leverage TensorFlow and Keras for AI-based analysis. You MUST be familiar with writing programs, be able to find/search solutions from online documents and Stack Overflow, and debug on your own. This course REQUIRES students to implement Python scripts.

Note this course is designed for students who are in their third or fourth year of college. If you have taken any advanced AI/ML/DM course, you may want to skip this course.

核心能力分析圖 Core Competence Analysis Chart

能力項目說明


    課程目標與學習成效Course Objectives & Learning Outcomes

    • Understand the relationship of Cybersecurity and Security Management.
    • Understand the concept of detection, the profiling subject, profiling techniques, misuse detection, and anomaly detection.
    • Understand the concept of static analysis and dynamic analysis.
    • Familiar with data analysis environment, GPU-based computation, and cloud computing.
    • Understand the data analysis algorithms: distance function, similarity function, classification, clustering, machine learning algorithms for security application.
    • Understand the neural network structures and algorithms.
    • Understand the operation of security-related information systems from the perspective of the data-driven system: intrusion detection system, anomaly detection system, spam mail filter system and sequence analysis system.
    • Understand visualized machine learning tools: Orange

    每周課程進度與作業要求 Course Schedule & Requirements

    教學週次Course Week 彈性補充教學週次Flexible Supplemental Instruction Week 彈性補充教學類別Flexible Supplemental Instruction Type

    週次

    Week

    課程主題

    Topic

    課程內容與指定閱讀

    Content and Reading Assignment

    教學活動與作業

    Teaching Activities and Homework

    學習投入時間

    Student workload expectation

    課堂講授

    In-class Hours

    課程前後

    Outside-of-class Hours

    1

    Introduction to Cybersecurity

    Security Management, Cyber Attack, Data Analysis Environment

    Lecture.

    3

    6

    -

    Reserved

    Lecture

    Lecture.

    3

    6

    3

    Supervised Learning: classification I

    Lecture: Table-Based Data and Data Analysis Process, Visualized Machine Learning Tool

    Lecture.

    Homework: simple classification

    3

    6

    4

    Supervised Learning: classification II

    Lecture: Supervised Learning Algorithms, Tree-based Classification

    Lecture.

    Homework: tree-based classification

    3

    6

    5

    Unsupervised Learning: clustering

    Lecture: Unsupervised Learning Algorithms, Problematic Data

    Lecture.

    Homework: clustering email, missing values

    3

    6

    6

    Static Analysis

    Lecture: Static Analysis, Digital Signature

    Lecture. Class Demonstration.

    Homework: implement a PE parser

    3

    6

    -

    Reserved

    Lecture

    Lecture.

    3

    6

    8

     Dynamic Analysis and Packet 

    Lecture: Network: Packet Capture, System Call and API Call, 

    Lecture. Homework: packet filter

    3

    6

    9

    Midterm

    Midterm

    Midterm

    3

    6

    10

    Deep Learning Basics

    Lecture: The concept of Neural Network.

    Lecture.

    3

    6

    11

    Latent Space

    Lecture: Activation Function Visualization, Auto Encoder

    Lecture.

    Homework: MNIST

    3

    6

    12

    Latent Space II

    Lecture: K-means and Self-Organizing Map, Word Embedding

    Lecture.

    Homework: SOM

    3

    6

    13

    Language Model

    Lecture: Bert

    Lecture.

    Homework: downstream detection

    3

    6

    14

    Text-based Analysis with Orange

    Lecture: The concept of data visualization.

    Lecture.

    Homework.

    3

    6

    15

    Intrusion Detection Intrusion Detection

    Lab

    3

    6

    16

    Anomaly Detection Anomaly Detection

    Lab

    3

    6

    17

    Project Presentation

    Project Presentation

    Project Presentation

    3

    6

    18

    Final

    Final

    Final

    3

    6

    授課方式Teaching Approach

    60%

    講述 Lecture

    10%

    討論 Discussion

    20%

    小組活動 Group activity

    10%

    數位學習 E-learning

    0%

    其他: Others:

    評量工具與策略、評分標準成效Evaluation Criteria

    • Homework (45%): programming exercises and essays. You MUST see the ACADEMIC INTEGRITY section before taking this class.
    • Project (15%): student needs to write an analysis program on a security-related data set to demonstrate their understanding of security issues and data analysis skill. A proposal, a report, a presentation, and uploaded GitHub codes are required.
    • Midterm and Final (40%)

     

    The Problem Solving Through Inquiry and Data Analysis rubric can be found here. You MUST read it carefully before submitting your first homework. It allows you to know exactly the way in which you will be assessed, it is helpful in facilitating academic integrity.

     

    Academic Integrity

    • Plagiarism is a serious breach of academic trust. In academic work, our words, ideas, and programs are the value of our work, so turning in someone else’s work as if it were your own is a form of theft. When you use someone else’s words, ideas, or programs without crediting the source or authorship of those words, ideas, and programs, you are plagiarizing. So here’s the bottom line: original work only, credit to ideas, writing, words, or programs from someone other than you. Plagiarized work will automatically receive a “0” or “F” for the assignment.
    • Since cheating usually arises out of desperation and everyone has the occasional problem and finishes their work late, this class accepts late homework submission, but with a 15% per day penalty. We encourage you to complete your homework rather than drop it. Any oral discussion with classmates, TA, and lecturer is welcomed, but you MUST NOT share any of your code in any form.

    指定/參考書目Textbook & References

    • Supervised Learning: classification Network Security Through Data Analysis, Michael Collins, OREILLY, 2014.
    • Data-Driven Security: Analysis, Visualization and Dashboards, Jay Jacobs and Bob Rudis, Wiley, 2014.
    • Machine Learning for Cyber Security
    • Data Science for Cyber-Security
    • Awesome Machine Learning for Cyber Security
    • Python Data Science Handbook
    • Malware Data Science: Attack Detection and Attribution, Joshua Saxe and Hillary Sanders, No Starch Press, Nov. 2018.
    • Python for Data Analysis, Wes McKinney, O'Reilly Media, October 2012.
    • https://machinelearningmastery.com

    已申請之圖書館指定參考書目 圖書館指定參考書查詢 |相關處理要點

    維護智慧財產權,務必使用正版書籍。 Respect Copyright.

    課程相關連結Course Related Links

    https://sites.google.com/view/mikehsiao/teaching/data-science-for-cybersecurity-2024

    課程附件Course Attachments

    課程進行中,使用智慧型手機、平板等隨身設備 To Use Smart Devices During the Class

    需經教師同意始得使用 Approval

    列印