Type of Credit: Elective
Credit(s)
Number of Students
https://sites.google.com/view/mikehsiao/teaching/ds4cs-2025
This course serves as an introductory triggering class for students who are interested in cybersecurity analysis using machine learning methods. Students should get familiar with tools, algorithms, concepts, and the execution environment to perform data analysis on cybersecurity data. Students need to learn to be architects to solve security-related problems using data analysis algorithms and tools. Related security concepts, data analysis theories, research papers, and background knowledge will be covered in the class. We will introduce several security systems that implement data analysis algorithms to achieve their security goals.
Note that students should take programming courses before, such as Programming Language I/II. The programming language used in this class is Python (however we will NOT cover any Python language tutorial), and we will leverage TensorFlow and Keras for AI-based analysis. You MUST be familiar with writing programs, be able to find/search solutions from online documents and Stack Overflow, and debug on your own. This course REQUIRES students to implement Python scripts in homework and projects.
Note this course is designed for students in MIS gradate students for Advanced Information System Development. The class will be conducted for 16 weeks.
能力項目說明
Understand the concept of detection, the profiling subject, profiling techniques, misuse detection, and anomaly detection.
Understand the concept of static analysis and dynamic analysis.
Understand the data analysis algorithms: distance function, similarity function, classification, clustering, and machine learning algorithms for security applications.
Understand the neural network structures and algorithms.
Understand the usage of language model to analyze security realted data.
Understand the operation of security-related information systems from the perspective of the data-driven system: intrusion detection system, anomaly detection system, spam mail filter system, and sequence analysis system.
教學週次Course Week | 彈性補充教學週次Flexible Supplemental Instruction Week | 彈性補充教學類別Flexible Supplemental Instruction Type |
---|---|---|
W1 (02/19): Regression (M03)
Model, Linear Regression (MSE, Gradient Descent)
W2 (02/26): Classification (M04)
Logistic Regression (Cross-Entropy)
Support Vector Machine
Evaluation
W3 (03/05): Tree (M06)
Tree and Random Forest
Entropy, Information Gain, Gini, Chi, Variance
W4 (03/12): Clustering (M07)
K-means
Hierarchical Clustering
DBScan
W5 (03/19): Problematic Data (M08, M09)
Dimension Reduction, PCA
Problematic Data
W6 (03/26): Neural Network
Bascis (N01)
Convolution (N02)
(04/02): No class.
W7 (04/09): Recurrent NN (N03)
Static Analysis: Windows PE file and image analysis (D01)
Understanding LSTM Networks (N03-1
Dynamic Analysis: Malware call and sequence analysis (D02)
Text classification with an RNN (N03-2)
W8 (04/16): Midterm (take home exam, due before 04/23.)
W9 (04/23): Latent Space
Auto-Encoder (N04
Activation Function (N05
W10 (04/30) Language Model (N06)
word2vec (cbow, skip-gram), fastText (supervised, unsupervised)
Transformer, Self-Attention, BERT
W11 (05/07): Language Model
Basic text classification (N06-2), Classify text with BERT (N06-3)
HuggingFace NLP Course (N06-4)
1. Transformer Models, 2. Using Transformer, 3. Fine-Tuning a Pretrained Model
7-3. Fine-tuning a masked language model
Packet Analysis (D03)
W12 (05/14): Language Model and Others
Transfer learning & fine-tuning
LoRA, Parameter-Efficient Fine-Tuning (PEFT)
Classification on imbalanced data
(05/21): No class. University Anniversary.
W13 (05/28): Large Language Model
NLP Course, Diffusion Cours
W14 (06/04): Anomaly Detection
Variational Autoencoder (N04-2)
V. Chandola, A. Banerjee and V. Kumar, "Anomaly Detection: A Survey," ACM Computing Survey, vol. 41, no. 3, July 2009.
Novelty and Outlier Detection
One-class SVM
Self-Organized Map
W15 (06/11): Project Dem
W16 (06/18): Final (take home exam, due 06/18 at 23:59)
Homework (50%): programming exercises and essays. You MUST see the ACADEMIC INTEGRITY section before taking this class.
Project (10%): student needs to write an analysis program on a security-related data set to demonstrate their understanding of security issues and data analysis skill. A proposal, a report, a presentation, and GitHub codes are required.
Midterm (20%)
Final (20%)
TBA
https://sites.google.com/view/mikehsiao/teaching/ds4cs-2025