Type of Credit: Elective
Credit(s)
Number of Students
The explosion of digital data represents an unprecedented opportunity for social research! However, fully leveraging the information revolution requires not only downstream analytical skills, such as data visualization and modeling, but also critical upstream skills of data collection, cleaning, curation and wrangling. This course will cover the full stack of tools in the data science toolkit, but with particular, hands-on emphasis on the upstream skills.
• Coding Basics & Research Design
• Data Collection
• Data Wrangling & Curation
• Data Cleaning
• Data Visualization
• Data Analysis
• Presentation of Findings
This course was created with IDAS students in mind. PhD and MA students from other graduate programs are welcome to join. No prerequisites and no prior coding experience are required.
能力項目說明
In terms of practical data science skills, you will learn:
1. Upstream skills: data collection, cleaning, curation, wrangling
2. Downstream skills: data visualization, analysis, presentation
In terms of applying data science to social science research, you will learn how to:
3. Enhance a research design with creative data and high-validity measures
4. Design a series of visualizations and analyses that support a logical argument
教學週次Course Week | 彈性補充教學週次Flexible Supplemental Instruction Week | 彈性補充教學類別Flexible Supplemental Instruction Type |
---|---|---|
Week | Topic | Content and Reading Assignment | Teaching Activities and Homework |
1 | Role of Data Science in Social Science Research |
|
|
2 | Coding I | data types, control structures | Coding task |
3 | Coding II | functions, classes | Coding task |
4 | File Management & I/O | os, dir, paths, file types: xlsx, csv, json, xml, txt | Coding task |
5 | Data Collection | online APIs, web-scraping | Coding task |
6 | Wrangling & Curation I | selection queries, simple covariates | Coding task & Datathon I: Collect raw data files from Internet |
7 | Wrangling & Curation II | grouping, aggregate fxns and agg covariates | Coding task |
8 | Wrangling & Curation III | joins, melt & pivot, indexes & look-up tables | Coding task |
9 | Data Cleaning | recoding, missing values, errors, disambig. | Coding task |
10 | Visualization I | univariate & bivariate graphs | Coding task & Datathon II: Produce Clean, Data Table from Raw Data |
11 | Visualization II | multivariate graphs | Coding task |
12 | Visualization III | dynamic graphs, dashboards | Coding task |
13 | Analysis I | descriptive statistics, t-tests of means & props | Coding task & Datathon III: Produce Graphs from Clean Data |
14 | Analysis II | clustering, unsupervised labels as covariates | Coding task |
15 | Analysis III | linear and logistic regression | Coding task |
16 | Research Poster Session | present research poster | Datathon IV: Describe and Model Clean Data |
Task | Points per Assessment | Percent of Semester Grade |
Attendance | After TWO FREE ABSENCES, each unexcused absence is -5% | 0% |
Tasks | 10 tasks * 2% each | 20% |
Datathons | 4 challenges * 10% each | 40% |
Research Poster | 1 poster & presentation | 40% |
Molin, S. (2021). Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization. Packt Publishing Ltd.
Moodle & Google Drive Links – To Be Announced