# 需安裝軟體

種類 用途 相關資料
Anaconda 環境架構 4
Pandas DataFrame
Rows: records (cases)
Cols: features (variables)
- 官網教學
- 基本語法
Seaborn Visualization Intro

# Python 套件

# Python scientific computing

  • https://scipy-lectures.org/
  • https://github.com/jakevdp/PythonDataScienceHandbook

# Python for R and Matlab users

  • http://mathesaurus.sourceforge.net/r-numpy.html
  • https://numpy.org/doc/stable/user/numpy-for-matlab.html

# 課本 / 資料

  • An Introduction to Statistical Learning with Applications in Python
    Authors: James, Witten, Hastie, and Tibshirani and Taylor
    • https://www.statlearning.com/
  • Slides and videos for Statistical Learning MOOC by Hastie and Tibshirani
    • https://www.edx.org/learn/python/stanford-university-statistical-learning-with-python
    • https://www.youtube.com/playlist?list=PLoROMvodv4rOzrYsAxzQyHb8n_RWNuS1e
    • https://www.youtube.com/playlist?list=PLoROMvodv4rNHU1-iPeDRH-J0cL-CrIda
  • For the exercises of each chapter, there is a GitHub repository of solutions provided by students.
    • https://github.com/hardikkamboj/An-Introduction-to-Statistical-Learning
    • http://blog.princehonest.com/stat-learning/

# Some common statistics

Operation on Dataframe

  • Estimate of location
    • Mean, median, trimmed mean, mode
  • Estimate of variability
    • Variance, Mean (median) absolute deviation, interquartile range (IQR, difference between 25th and 75th percentile)
  • Basic filtering, reshaping and combining

# Data

# Wage data

這個數據集包含了 3000 名中大西洋地區男性工人的工資數據

Scatterplot and Boxplot

# Stock Market data

這個數據集包含了 2001 年至 2005 年間標普 500 股票指數的每日百分比回報率

Boxplot and heatmap

# Gene Expression Data

這個數據集包含了 64 個癌症細胞系中 6830 個基因的表達水平。還記錄了癌症類型

Scatterplot

# Auto data

這個數據集包含了 392 輛汽車的燃油效率、馬力和其他信息

Joinplot, displot and pairplot

# Bikeshare Data

這個數據集包含了 2011 年和 2012 年在首都共享單車系統中租用單車的每小時和每日計數,以及天氣和季節性信息

line plot