# 需安裝軟體
| 種類 | 用途 | 相關資料 |
|---|---|---|
| Anaconda | 環境架構 | 4 |
| Pandas | DataFrame Rows: records (cases) Cols: features (variables) |
- 官網教學 - 基本語法 |
| Seaborn | Visualization | Intro |
# Python 套件
# Python scientific computing
- https://scipy-lectures.org/
- https://github.com/jakevdp/PythonDataScienceHandbook
# Python for R and Matlab users
- http://mathesaurus.sourceforge.net/r-numpy.html
- https://numpy.org/doc/stable/user/numpy-for-matlab.html
# 課本 / 資料
- An Introduction to Statistical Learning with Applications in Python
Authors: James, Witten, Hastie, and Tibshirani and Taylor- https://www.statlearning.com/
- Slides and videos for Statistical Learning MOOC by Hastie and Tibshirani
- https://www.edx.org/learn/python/stanford-university-statistical-learning-with-python
- https://www.youtube.com/playlist?list=PLoROMvodv4rOzrYsAxzQyHb8n_RWNuS1e
- https://www.youtube.com/playlist?list=PLoROMvodv4rNHU1-iPeDRH-J0cL-CrIda
- For the exercises of each chapter, there is a GitHub repository of solutions provided by students.
- https://github.com/hardikkamboj/An-Introduction-to-Statistical-Learning
- http://blog.princehonest.com/stat-learning/
# Some common statistics
Operation on Dataframe
- Estimate of location
- Mean, median, trimmed mean, mode
- Estimate of variability
- Variance, Mean (median) absolute deviation, interquartile range (IQR, difference between 25th and 75th percentile)
- Basic filtering, reshaping and combining
# Data
# Wage data
這個數據集包含了 3000 名中大西洋地區男性工人的工資數據
| Scatterplot and Boxplot |
|---|
![]() |
# Stock Market data
這個數據集包含了 2001 年至 2005 年間標普 500 股票指數的每日百分比回報率
| Boxplot and heatmap |
|---|
![]() |
# Gene Expression Data
這個數據集包含了 64 個癌症細胞系中 6830 個基因的表達水平。還記錄了癌症類型
| Scatterplot |
|---|
![]() |
# Auto data
這個數據集包含了 392 輛汽車的燃油效率、馬力和其他信息
| Joinplot, displot and pairplot |
|---|
![]() |
# Bikeshare Data
這個數據集包含了 2011 年和 2012 年在首都共享單車系統中租用單車的每小時和每日計數,以及天氣和季節性信息
| line plot |
|---|
![]() |




