學年學期 Academic Year / Semester | 101學年度第2學期 | Spring Semester, 2013 | ||||
開課單位 Course Department | 資科三、資科四、資科碩一、資科碩二 | MA Program of Computer Science, Second Year | ||||
課程名稱 Course Name | (中 Ch.)網路搜索與探勘 | (英 Eng.)Web Search and Mining | ||||
授課教師 Instructor | 蔡銘峰 | TSAI MING-FENG | ||||
職稱 Title | 專任助理教授 | Assistant Professor | ||||
學分數 No. of Credits | 3.0 | |||||
修別 Type of Credit | 選修 | Elective | ||||
先修科目 Prerequisite(s) | ||||||
點閱核心能力分析圖與授課方式比例圖 |
The goal of this course is: 1) to provide an overview of Web Search and Mining related research, 2) to systematically review the core research topics in the field, 3) to show case the most recent research progress, and 4) to give students enough training for doing research in the field and an opportunity to work on a research project.
Part I: Web Search
• Evaluation
• Retrieval Model
• Language Model
• Link Analysis
• Web Crawling
Part II: Web Mining
• Classification
• Clustering
• Learning to Rank
• Recommendation
Part III: Data-Intensive Information Processing
• Introduction to MapReduce
• MapReduce: the Programming Environment
1. (1 week) Introduction: Goals and history of Web Search and Mining; IR vs. Web Search; DM vs. Web Mining.
2. (2 weeks) Web Search 1 - Ranking Evaluation; Probabilistic Information Retrieval
3. (2 weeks) Web Search 2 - Language Model for Information Retrieval
4. (2 weeks) Web Search 3 - Processing Text: Text statistics; Link Analysis
5. (1 week) Web Search 4 - Web Crawling
6. (1 week) Web Mining 1 - Classification and Naive Bayes
7. (2 weeks) Web Mining 2 - Supported Vector Machines; K Nearest Neighbor
8. (2 weeks) Web Mining 3 - Clustering: Flat clustering and Hierarchical clustering
9. (1 weeks) Web Mining 4 - Clustering: K-Means Clustering; Clustering and Search
10. (2 weeks) Data-Intensive Information Processing - Overview of Cloud Computing; Map Reduce; Hadoop
The course will involve lectures by instructor, student presentations, and research projects on major research topics in Web Search and Mining related research. Students are expected to read quite a few research papers and present some of them at the class. There will be a midterm and a few assignments. Students are also required to finish a course project (group work is allowed and encouraged).
Grade assignments; Prepare assignments; Answer Questions
Grading will be based on the following weighting scheme:
• Class participation: 10%
• Assignments: 30%
• Midterm exam: 30%
• Project: 30%
• Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze.
• Search Engines: Information Retrieval in Practice, by Bruce Croft, Donald Metzler, Trevor Strohman.
• Data-Intensive Text Processing with MapReduce, by Jimmy Lin and Chris Dyer.
• Hadoop: The Definitive Guide, by Tom White.