Python金融大数据分析——第6章 金融时间序列 笔记 我会带着你远行 2022-05-21 01:14 481阅读 0赞 * 第6章 金融时间序列 * 6.1 pandas基础 * 6.1.1 使用DataFrame类的第一步 * 6.1.2 使用DataFrame类的第二步 * 6.1.3 基本分析 * 6.1.4 Series类 * 6.1.5 GroupBy操作 * 6.2 金融数据 * 6.3 回归分析 * 6.4 高频数据 # 第6章 金融时间序列 # ## 6.1 pandas基础 ## ### 6.1.1 使用DataFrame类的第一步 ### import pandas as pd import numpy as np df = pd.DataFrame([10, 20, 30, 40], columns=['numbers'], index=['a', 'b', 'c', 'd']) df # numbers # a 10 # b 20 # c 30 # d 40 df.index # Index(['a', 'b', 'c', 'd'], dtype='object') df.columns # Index(['numbers'], dtype='object') df.loc['c'] # numbers 30 # Name: c, dtype: int64 df.loc[['a', 'b']] # numbers # a 10 # b 20 df.loc[df.index[1:3]] # numbers # b 20 # c 30 df.sum() # numbers 100 # dtype: int64 df.apply(lambda x: x ** 2) # numbers # a 100 # b 400 # c 900 # d 1600 df ** 2 # numbers # a 100 # b 400 # c 900 # d 1600 df['floats'] = (1.5, 2.5, 3.5, 4.5) df # numbers floats # a 10 1.5 # b 20 2.5 # c 30 3.5 # d 40 4.5 df['floats'] # a 1.5 # b 2.5 # c 3.5 # d 4.5 # Name: floats, dtype: float64 df.floats # a 1.5 # b 2.5 # c 3.5 # d 4.5 # Name: floats, dtype: float64 df['names'] = pd.DataFrame(['Yves', 'Guido', 'Felix', 'Francesc'], index=['d', 'a', 'b', 'c']) df # numbers floats names # a 10 1.5 Guido # b 20 2.5 Felix # c 30 3.5 Francesc # d 40 4.5 Yves df.append({ 'numbers': 100, 'floats': 5.75, 'names': 'Henry'}, ignore_index=True) # numbers floats names # 0 10 1.50 Guido # 1 20 2.50 Felix # 2 30 3.50 Francesc # 3 40 4.50 Yves # 4 100 5.75 Henry df = df.append(pd.DataFrame({ 'numbers': 100, 'floats': 5.75, 'names': 'Henry'}, index=['z', ])) df # floats names numbers # a 1.50 Guido 10 # b 2.50 Felix 20 # c 3.50 Francesc 30 # d 4.50 Yves 40 # z 5.75 Henry 100 df.join(pd.DataFrame([1, 4, 9, 16, 25], index=['a', 'b', 'c', 'd', 'y'], columns=['squares', ])) # floats names numbers squares # a 1.50 Guido 10 1.0 # b 2.50 Felix 20 4.0 # c 3.50 Francesc 30 9.0 # d 4.50 Yves 40 16.0 # z 5.75 Henry 100 NaN df = df.join(pd.DataFrame([1, 4, 9, 16, 25], index=['a', 'b', 'c', 'd', 'y'], columns=['squares', ]), how='outer') df # floats names numbers squares # a 1.50 Guido 10.0 1.0 # b 2.50 Felix 20.0 4.0 # c 3.50 Francesc 30.0 9.0 # d 4.50 Yves 40.0 16.0 # y NaN NaN NaN 25.0 # z 5.75 Henry 100.0 NaN df[['numbers', 'squares']].mean() # numbers 40.0 # squares 11.0 # dtype: float64 df[['numbers', 'squares']].std() # numbers 35.355339 # squares 9.669540 # dtype: float64 ### 6.1.2 使用DataFrame类的第二步 ### a = np.random.standard_normal((9, 4)) a.round(6) # array([[ 0.109076, -1.05275 , 1.253471, 0.39846 ], # [-1.561175, -1.997425, 1.158739, -2.030734], # [ 0.764723, 0.760368, 0.864103, -0.174079], # [ 2.429043, 0.281962, -0.496606, 0.009445], # [-1.679758, -1.02374 , -1.135922, 0.077649], # [-0.247692, 0.301198, 2.156474, 1.537902], # [ 1.162934, 2.102327, -0.4501 , 0.812529], # [-0.374749, -0.818229, -1.013962, -0.476855], # [ 0.626347, 2.294829, -1.29531 , -0.031501]]) df = pd.DataFrame(a) df # 0 1 2 3 # 0 0.109076 -1.052750 1.253471 0.398460 # 1 -1.561175 -1.997425 1.158739 -2.030734 # 2 0.764723 0.760368 0.864103 -0.174079 # 3 2.429043 0.281962 -0.496606 0.009445 # 4 -1.679758 -1.023740 -1.135922 0.077649 # 5 -0.247692 0.301198 2.156474 1.537902 # 6 1.162934 2.102327 -0.450100 0.812529 # 7 -0.374749 -0.818229 -1.013962 -0.476855 # 8 0.626347 2.294829 -1.295310 -0.031501 df.columns = [['No1', 'No2', 'No3', 'No4']] df # No1 No2 No3 No4 # 0 0.109076 -1.052750 1.253471 0.398460 # 1 -1.561175 -1.997425 1.158739 -2.030734 # 2 0.764723 0.760368 0.864103 -0.174079 # 3 2.429043 0.281962 -0.496606 0.009445 # 4 -1.679758 -1.023740 -1.135922 0.077649 # 5 -0.247692 0.301198 2.156474 1.537902 # 6 1.162934 2.102327 -0.450100 0.812529 # 7 -0.374749 -0.818229 -1.013962 -0.476855 # 8 0.626347 2.294829 -1.295310 -0.031501 df['No2'][3] # 0.2819621128403918 dates = pd.date_range('2018-01-01', periods=9, freq='M') dates # DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30', # '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31', # '2018-09-30'], # dtype='datetime64[ns]', freq='M') data\_range函数参数 <table> <thead> <tr> <th>参数</th> <th>格式</th> <th>描述</th> </tr> </thead> <tbody> <tr> <td>start</td> <td>字符串/日期时间</td> <td>生成日期的左界</td> </tr> <tr> <td>end</td> <td>字符串/日期时间</td> <td>生成日期的右界</td> </tr> <tr> <td>periods</td> <td>整数/None</td> <td>期数(如果start或者end空缺)</td> </tr> <tr> <td>freq</td> <td>字符串/日期偏移</td> <td>频率字符串,例如5D (5天)</td> </tr> <tr> <td>tz</td> <td>字符串/None</td> <td>本地化索引的时区名称</td> </tr> <tr> <td>nonnalize</td> <td>布尔值,默认None</td> <td>将star和end规范化为午夜</td> </tr> <tr> <td>name</td> <td>字符串,默认None</td> <td>结果索引名称</td> </tr> </tbody> </table> df.index = dates df # No1 No2 No3 No4 # 2018-01-31 0.109076 -1.052750 1.253471 0.398460 # 2018-02-28 -1.561175 -1.997425 1.158739 -2.030734 # 2018-03-31 0.764723 0.760368 0.864103 -0.174079 # 2018-04-30 2.429043 0.281962 -0.496606 0.009445 # 2018-05-31 -1.679758 -1.023740 -1.135922 0.077649 # 2018-06-30 -0.247692 0.301198 2.156474 1.537902 # 2018-07-31 1.162934 2.102327 -0.450100 0.812529 # 2018-08-31 -0.374749 -0.818229 -1.013962 -0.476855 # 2018-09-30 0.626347 2.294829 -1.295310 -0.031501 data\_range函数频率参数值 <table> <thead> <tr> <th>别名</th> <th>描述</th> </tr> </thead> <tbody> <tr> <td>B</td> <td>交易日</td> </tr> <tr> <td>C</td> <td>自定义交易日(试验性)</td> </tr> <tr> <td>D</td> <td>日历日</td> </tr> <tr> <td>W</td> <td>每周</td> </tr> <tr> <td>M</td> <td>每月底</td> </tr> <tr> <td>BM</td> <td>每月最后一个交易日</td> </tr> <tr> <td>MS</td> <td>月初</td> </tr> <tr> <td>BMS</td> <td>每月第一个交易日</td> </tr> <tr> <td>Q</td> <td>季度末</td> </tr> <tr> <td>BQ</td> <td>每季度最后一个交易日</td> </tr> <tr> <td>QS</td> <td>季度初</td> </tr> <tr> <td>BQS</td> <td>每季度第一个交易日</td> </tr> <tr> <td>A</td> <td>每年底</td> </tr> <tr> <td>BA</td> <td>每年最后一个交易日</td> </tr> <tr> <td>AS</td> <td>每年初</td> </tr> <tr> <td>BAS</td> <td>每年第一个交易日</td> </tr> <tr> <td>H</td> <td>每小时</td> </tr> <tr> <td>T</td> <td>每分钟</td> </tr> <tr> <td>S</td> <td>每秒</td> </tr> <tr> <td>L</td> <td>毫秒</td> </tr> <tr> <td>U</td> <td>微秒</td> </tr> </tbody> </table> 通常可以从一个 ndarray 对象生成 DataFrame 对象。 但是也可以简单地使用NumPy的array函数从DataFrame 生成一个 ndarray。 np.array(df).round(6) # array([[ 0.109076, -1.05275 , 1.253471, 0.39846 ], # [-1.561175, -1.997425, 1.158739, -2.030734], # [ 0.764723, 0.760368, 0.864103, -0.174079], # [ 2.429043, 0.281962, -0.496606, 0.009445], # [-1.679758, -1.02374 , -1.135922, 0.077649], # [-0.247692, 0.301198, 2.156474, 1.537902], # [ 1.162934, 2.102327, -0.4501 , 0.812529], # [-0.374749, -0.818229, -1.013962, -0.476855], # [ 0.626347, 2.294829, -1.29531 , -0.031501]]) ### 6.1.3 基本分析 ### df.sum() # No1 1.228750 # No2 0.848540 # No3 1.040888 # No4 0.122816 # dtype: float64 df.mean() # No1 0.136528 # No2 0.094282 # No3 0.115654 # No4 0.013646 # dtype: float64 df.cumsum() # No1 No2 No3 No4 # 2018-01-31 0.109076 -1.052750 1.253471 0.398460 # 2018-02-28 -1.452099 -3.050176 2.412210 -1.632274 # 2018-03-31 -0.687376 -2.289807 3.276313 -1.806353 # 2018-04-30 1.741667 -2.007845 2.779707 -1.796908 # 2018-05-31 0.061909 -3.031585 1.643786 -1.719259 # 2018-06-30 -0.185783 -2.730387 3.800259 -0.181357 # 2018-07-31 0.977152 -0.628061 3.350160 0.631172 # 2018-08-31 0.602403 -1.446289 2.336198 0.154317 # 2018-09-30 1.228750 0.848540 1.040888 0.122816 df.describe() # No1 No2 No3 No4 # count 9.000000 9.000000 9.000000 9.000000 # mean 0.136528 0.094282 0.115654 0.013646 # std 1.300700 1.465006 1.256782 0.972826 # min -1.679758 -1.997425 -1.295310 -2.030734 # 25% -0.374749 -1.023740 -1.013962 -0.174079 # 50% 0.109076 0.281962 -0.450100 0.009445 # 75% 0.764723 0.760368 1.158739 0.398460 # max 2.429043 2.294829 2.156474 1.537902 np.sqrt(df) # No1 No2 No3 No4 # 2018-01-31 0.330266 NaN 1.119585 0.631236 # 2018-02-28 NaN NaN 1.076447 NaN # 2018-03-31 0.874484 0.871991 0.929571 NaN # 2018-04-30 1.558539 0.531001 NaN 0.097184 # 2018-05-31 NaN NaN NaN 0.278656 # 2018-06-30 NaN 0.548815 1.468494 1.240122 # 2018-07-31 1.078394 1.449940 NaN 0.901404 # 2018-08-31 NaN NaN NaN NaN # 2018-09-30 0.791421 1.514869 NaN NaN np.sqrt(df).sum() # No1 4.633104 # No2 4.916616 # No3 4.594098 # No4 3.148602 # dtype: float64 df.cumsum().plot(lw=2.0) ![DataFrame对象的线图][DataFrame] [plot方法参数][plot] <table> <thead> <tr> <th>参数</th> <th>格式</th> <th>描述</th> </tr> </thead> <tbody> <tr> <td>x</td> <td>标签/位置,默认None</td> <td>只在列值为x刻度时使用</td> </tr> <tr> <td>y</td> <td>标签/位置,默认None</td> <td>只在列值为x刻度时使用</td> </tr> <tr> <td>subplots</td> <td>布尔值,默认False</td> <td>子图中的绘图列</td> </tr> <tr> <td>sharex</td> <td>布尔值,默认True</td> <td>共用x轴</td> </tr> <tr> <td>sharey</td> <td>布尔值,默认False</td> <td>共用y轴</td> </tr> <tr> <td>use_index</td> <td>布尔值,默认True</td> <td>使用DataFrame.index作为x轴刻度</td> </tr> <tr> <td>stacked</td> <td>布尔值,默认False</td> <td>堆叠(只用于柱状图)</td> </tr> <tr> <td>sort_columns</td> <td>布尔值,默认False</td> <td>在绘图之前将列按字母顺序排列</td> </tr> <tr> <td>title</td> <td>字符串,默认None</td> <td>图表标题</td> </tr> <tr> <td>grid</td> <td>布尔值,默认False</td> <td>水平和垂直网格线</td> </tr> <tr> <td>legend</td> <td>布尔值,默认True</td> <td>标签图例</td> </tr> <tr> <td>ax</td> <td>matplotlib轴对象</td> <td>绘图使用的matplotlib轴对象</td> </tr> <tr> <td>style</td> <td>字符串或者列表/字典</td> <td>绘图线形(每列)</td> </tr> <tr> <td>kind</td> <td>‘line’ : line plot (default) <br>‘bar’ : vertical bar plot <br>‘barh’ : horizontal bar plot <br>‘hist’ : histogram <br>‘box’ : boxplot <br>‘kde’ : Kernel Density Estimation plot <br>‘density’ : same as ‘kde’ <br>‘area’ : area plot <br>‘pie’ : pie plot <br>‘scatter’ : scatter plot <br>‘hexbin’ : hexbin plot <br></td> <td>图表类型</td> </tr> <tr> <td>logx</td> <td>布尔值,默认False</td> <td>x轴的对数刻度</td> </tr> <tr> <td>logy</td> <td>布尔值,默认False</td> <td>y轴的对数刻度</td> </tr> <tr> <td>xticks</td> <td>序列,默认index</td> <td>x轴刻度</td> </tr> <tr> <td>yticks</td> <td>序列,默认Values</td> <td>y轴刻度</td> </tr> <tr> <td>xlim</td> <td>二元组,列表</td> <td>x轴界限</td> </tr> <tr> <td>ylim</td> <td>二元组,列表</td> <td>y轴界限</td> </tr> <tr> <td>rot</td> <td>整数,默认为None</td> <td>旋转x刻度</td> </tr> <tr> <td>secondary_y</td> <td>布尔值/序列,默认False</td> <td>第二个y轴</td> </tr> <tr> <td>mark_right</td> <td>布尔值,默认True</td> <td>第二个y轴自动设置标签</td> </tr> <tr> <td>colormap</td> <td>字符串/颜色映射对象,默认None</td> <td>用于绘图的颜色映射</td> </tr> <tr> <td>kwds</td> <td>关键字</td> <td>传递給matplotlib选项</td> </tr> </tbody> </table> ### 6.1.4 Series类 ### type(df) # pandas.core.frame.DataFrame df['No1'] # 2018-01-31 0.109076 # 2018-02-28 -1.561175 # 2018-03-31 0.764723 # 2018-04-30 2.429043 # 2018-05-31 -1.679758 # 2018-06-30 -0.247692 # 2018-07-31 1.162934 # 2018-08-31 -0.374749 # 2018-09-30 0.626347 # Freq: M, Name: No1, dtype: float64 type(df['No1']) # pandas.core.series.Series # DataFrame的主要方法也可用于Series对象 # 6-2 Series对象的线图 import matplotlib.pyplot as plt df['No1'].cumsum().plot(style='r', lw=2) plt.xlabel('date') plt.ylabel('value') DataFrame的主要方法也可用于Series对象 Series对象的线图 import matplotlib.pyplot as plt df['No1'].cumsum().plot(style='r', lw=2) plt.xlabel('date') plt.ylabel('value') ![Series对象的线图][Series] ### 6.1.5 GroupBy操作 ### df['Quarter'] = ['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2', 'Q3', 'Q3', 'Q3'] df # No1 No2 No3 No4 Quarter # 2018-01-31 0.109076 -1.052750 1.253471 0.398460 Q1 # 2018-02-28 -1.561175 -1.997425 1.158739 -2.030734 Q1 # 2018-03-31 0.764723 0.760368 0.864103 -0.174079 Q1 # 2018-04-30 2.429043 0.281962 -0.496606 0.009445 Q2 # 2018-05-31 -1.679758 -1.023740 -1.135922 0.077649 Q2 # 2018-06-30 -0.247692 0.301198 2.156474 1.537902 Q2 # 2018-07-31 1.162934 2.102327 -0.450100 0.812529 Q3 # 2018-08-31 -0.374749 -0.818229 -1.013962 -0.476855 Q3 # 2018-09-30 0.626347 2.294829 -1.295310 -0.031501 Q3 groups = df.groupby('Quarter') groups.mean() # No1 No2 No3 No4 # Quarter # Q1 -0.229125 -0.763269 1.092104 -0.602118 # Q2 0.167198 -0.146860 0.174649 0.541665 # Q3 0.471511 1.192976 -0.919790 0.101391 groups.max() # No1 No2 No3 No4 # Quarter # Q1 0.764723 0.760368 1.253471 0.398460 # Q2 2.429043 0.301198 2.156474 1.537902 # Q3 1.162934 2.294829 -0.450100 0.812529 groups.size() # Quarter # Q1 3 # Q2 3 # Q3 3 # dtype: int64 df['Odd_Even'] = ['Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd'] groups = df.groupby(['Quarter', 'Odd_Even']) groups.size() # Quarter Odd_Even # Q1 Even 1 # Odd 2 # Q2 Even 2 # Odd 1 # Q3 Even 1 # Odd 2 # dtype: int64 groups.mean() # No1 No2 No3 No4 # Quarter Odd_Even # Q1 Even -1.561175 -1.997425 1.158739 -2.030734 # Odd 0.436899 -0.146191 1.058787 0.112190 # Q2 Even 1.090676 0.291580 0.829934 0.773673 # Odd -1.679758 -1.023740 -1.135922 0.077649 # Q3 Even -0.374749 -0.818229 -1.013962 -0.476855 # Odd 0.894640 2.198578 -0.872705 0.390514 ## 6.2 金融数据 ## import datetime import pandas_datareader.data as web start = datetime.datetime(2016, 1, 1) # or start = '1/1/2016' or '2016-1-1' end = datetime.date.today() prices = web.DataReader('AAPL', 'yahoo', start, end) # 从雅虎财经拉取的苹果股价 prices.head() <table> <thead> <tr> <th> </th> <th>Open</th> <th>High</th> <th>Low</th> <th>Close</th> <th>Adj</th> <th>Close</th> <th>Volume</th> </tr> </thead> <tbody> <tr> <td>Date</td> <td></td> <td></td> <td></td> <td></td> <td></td> <td></td> <td></td> </tr> <tr> <td>2015-12-31</td> <td>107.010002</td> <td>107.029999</td> <td>104.820000</td> <td>105.260002</td> <td>100.540207</td> <td>40635300</td> <td></td> </tr> <tr> <td>2016-01-04</td> <td>102.610001</td> <td>105.370003</td> <td>102.000000</td> <td>105.349998</td> <td>100.626175</td> <td>67649400</td> <td></td> </tr> <tr> <td>2016-01-05</td> <td>105.750000</td> <td>105.849998</td> <td>102.410004</td> <td>102.709999</td> <td>98.104546</td> <td>55791000</td> <td></td> </tr> <tr> <td>2016-01-06</td> <td>100.559998</td> <td>102.370003</td> <td>99.870003</td> <td>100.699997</td> <td>96.184654</td> <td>68457400</td> <td></td> </tr> <tr> <td>2016-01-07</td> <td>98.680000</td> <td>100.129997</td> <td>96.430000</td> <td>96.449997</td> <td>92.125244</td> <td>81094400</td> <td></td> </tr> </tbody> </table> DataReader函数参数(源代码的参数说明已经很详细了) <table> <thead> <tr> <th>参数</th> <th>格式</th> <th>描述</th> </tr> </thead> <tbody> <tr> <td>name</td> <td>字符串</td> <td>数据集名称——通常是股票代码</td> </tr> <tr> <td>data_source</td> <td>如“yahoo”</td> <td>“yahoo”:Yahoo! Finance,<br>”google”:Google Finance,<br>”fred”:St. Louis FED (FRED),<br>”famafrench”:Kenneth French’s data library,<br>”edgar-index”:the SEC’s EDGAR Index<br></td> </tr> <tr> <td>start</td> <td>字符串/日期时间/None</td> <td>范围左界(默认”2010/1/1”)</td> </tr> <tr> <td>end</td> <td>字符串/日期时间/None</td> <td>范围右界(默认当天)</td> </tr> </tbody> </table> 指数历史水平 prices['Close'].plot(figsize=(8, 5)) ![指数历史水平][70] 指数和每日指数收益 prices[['Close', 'Return']].plot(subplots=True, style='b', figsize=(8, 5)) ![指数和每日指数收益][70 1] 指数及移动平均线 prices[['Close', '42d', '252d']].plot(figsize=(8, 5)) ![指数及移动平均线][70 2] 指数和移动年化波动率 import math prices['Mov_Vol'] = pd.rolling_std(prices['Return'], window=252) * math.sqrt(252) prices[['Close', 'Mov_Vol', 'Return']].plot(subplots=True, style='b', figsize=(8, 7)) ![指数和移动年化波动率][70 3] ## 6.3 回归分析 ## import pandas as pd import urllib.request es_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt' vs_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/h_vstoxx.txt' es_txt = 'E:/data/es.txt' vs_txt = 'E:/data/vs.txt' urllib.request.urlretrieve(es_url, es_txt) urllib.request.urlretrieve(vs_url, vs_txt) # 数据处理 lines = open(es_txt, 'r').readlines() lines = [line.replace(' ', '') for line in lines] # 生成一个新的文本文件 es50_txt = 'E:/data/es50.txt' new_file = open(es50_txt, 'w') new_file.writelines('date' + lines[3][:-1] + ';DEL' + lines[3][-1]) # DEL用来占位 new_file.writelines(lines[4:]) new_file.close() new_lines = open(es50_txt, 'r').readlines() new_lines[:5] # ['date;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF;DEL\n', # '31.12.1986;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n', # '01.01.1987;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n', # '02.01.1987;770.89;891.78;82.57;97.80;97.43;69.37;647.62;65.81\n', # '05.01.1987;771.89;898.33;82.82;98.60;98.19;69.16;649.94;65.82\n'] es = pd.read_csv(es50_txt, index_col=0, parse_dates=True, sep=';', dayfirst=True) es.head() # SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF DEL # date # 1986-12-31 775.00 900.82 82.76 98.58 98.06 69.06 645.26 65.56 NaN # 1987-01-01 775.00 900.82 82.76 98.58 98.06 69.06 645.26 65.56 NaN # 1987-01-02 770.89 891.78 82.57 97.80 97.43 69.37 647.62 65.81 NaN # 1987-01-05 771.89 898.33 82.82 98.60 98.19 69.16 649.94 65.82 NaN # 1987-01-06 775.92 902.32 83.28 99.19 98.83 69.50 652.49 66.06 NaN # 辅助列已经完成了使命,可以删除 del es['DEL'] es.info() # <class 'pandas.core.frame.DataFrame'> # DatetimeIndex: 7673 entries, 1986-12-31 to 2016-10-04 # Data columns (total 8 columns): # SX5P 7673 non-null float64 # SX5E 7673 non-null float64 # SXXP 7673 non-null float64 # SXXE 7673 non-null float64 # SXXF 7673 non-null float64 # SXXA 7673 non-null float64 # DK5F 7673 non-null float64 # DKXF 7673 non-null float64 # dtypes: float64(8) # memory usage: 539.5 KB # 使用 read_csv 函数的高级功能,使导人更加紧凑和高效: cols = ['SX5P', 'SX5E', 'SXXP', 'SXXE', 'SXXF', 'SXXA', 'DK5F', 'DKXF'] es = pd.read_csv(es_url, index_col=0, parse_dates=True, sep=';', dayfirst=True, header=None, skiprows=4, names=cols) es.tail() # SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF # 2016-09-28 2846.55 2991.11 342.57 324.24 407.97 350.45 9072.09 581.27 # 2016-09-29 2848.93 2991.58 342.72 324.08 407.65 350.90 9112.09 582.60 # 2016-09-30 2843.17 3002.24 342.92 325.31 408.27 350.09 9115.81 583.26 # 2016-10-03 2845.43 2998.50 343.23 325.08 408.44 350.92 9131.24 584.32 # 2016-10-04 2871.06 3029.50 346.10 327.73 411.41 353.92 9212.05 588.71 vs = pd.read_csv(vs_txt, index_col=0, header=2, parse_dates=True, sep=',', dayfirst=True) vs.info() # <class 'pandas.core.frame.DataFrame'> # DatetimeIndex: 4357 entries, 1999-01-04 to 2016-02-12 # Data columns (total 9 columns): # V2TX 4357 non-null float64 # V6I1 3906 non-null float64 # V6I2 4357 non-null float64 # V6I3 4296 non-null float64 # V6I4 4357 non-null float64 # V6I5 4357 non-null float64 # V6I6 4340 non-null float64 # V6I7 4357 non-null float64 # V6I8 4343 non-null float64 # dtypes: float64(9) # memory usage: 340.4 KB [pd.read\_csv参数][pd.read_csv] pd.read\_csv(filepath\_or\_buffer, sep=’,’, delimiter=None, header=’infer’, names=None, index\_col=None, usecols=None, squeeze=False, prefix=None, mangle\_dupe\_cols=True, dtype=None, engine=None, converters=None, true\_values=None, false\_values=None, skipinitialspace=False, skiprows=None, nrows=None, na\_values=None, keep\_default\_na=True, na\_filter=True, verbose=False, skip\_blank\_lines=True, parse\_dates=False, infer\_datetime\_format=False, keep\_date\_col=False, date\_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar=’”’, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize\_cols=False, error\_bad\_lines=True, warn\_bad\_lines=True, skipfooter=0, skip\_footer=0, doublequote=True, delim\_whitespace=False, as\_recarray=False, compact\_ints=False, use\_unsigned=False, low\_memory=True, buffer\_lines=None, memory\_map=False, float\_precision=None) import datetime as dt data = pd.DataFrame({'EUROSTOXX': es['SX5E'][es.index > dt.datetime(1999, 1, 1)]}) data = data.join(pd.DataFrame({'VSTOXX': vs['V2TX'][vs.index > dt.datetime(1999, 1, 1)]})) data = data.fillna(method='ffill') data.info() # <class 'pandas.core.frame.DataFrame'> # DatetimeIndex: 4554 entries, 1999-01-04 to 2016-10-04 # Data columns (total 2 columns): # EUROSTOXX 4554 non-null float64 # VSTOXX 4554 non-null float64 # dtypes: float64(2) # memory usage: 266.7 KB data.tail() # EUROSTOXX VSTOXX # 2016-09-28 2991.11 35.6846 # 2016-09-29 2991.58 35.6846 # 2016-09-30 3002.24 35.6846 # 2016-10-03 2998.50 35.6846 # 2016-10-04 3029.50 35.6846 # EURO STOXX 50指数和VSTOXX波动率指数 data.plot(subplots=True,grid=True,style='b',figsize=(8,6)) ![EURO STOXX 50指数和VSTOXX波动率指数][EURO STOXX 50_VSTOXX] EURO STOXX 50和VSTOXX对数收益率 rets=np.log(data/data.shift(1)) rets.head() # EUROSTOXX VSTOXX # 1999-01-04 NaN NaN # 1999-01-05 0.017228 0.489248 # 1999-01-06 0.022138 -0.165317 # 1999-01-07 -0.015723 0.256337 # 1999-01-08 -0.003120 0.021570 # EURO STOXX 50和VSTOXX对数收益率 rets.plot(subplots=True,grid=True,style='b',figsize=(8,6)) ![EURO STOXX 50和VSTOXX对数收益率][EURO STOXX 50_VSTOXX 1] 对数收益率散点图和回归线 # 对数收益率散点图和回归线 y = rets.VSTOXX # 结果变量 X = rets.EUROSTOXX # 预测变量 p_inf=float("inf") # 正无穷 n_inf=float("-inf") # 负无穷 y = y.map(lambda x: y.median() if x == p_inf or x == n_inf or np.isnan(x) else x) X = X.map(lambda x: y.median() if x == p_inf or x == n_inf or np.isnan(x) else x) import matplotlib.pyplot as plt import statsmodels.api as sm model = sm.OLS(y, X) model = model.fit() model.summary() # <class 'statsmodels.iolib.summary.Summary'> # """ # OLS Regression Results # ============================================================================== # Dep. Variable: VSTOXX R-squared: 0.526 # Model: OLS Adj. R-squared: 0.525 # Method: Least Squares F-statistic: 5043. # Date: Wed, 27 Jun 2018 Prob (F-statistic): 0.00 # Time: 19:55:10 Log-Likelihood: 8271.0 # No. Observations: 4554 AIC: -1.654e+04 # Df Residuals: 4553 BIC: -1.653e+04 # Df Model: 1 # Covariance Type: nonrobust # ============================================================================== # coef std err t P>|t| [0.025 0.975] # ------------------------------------------------------------------------------ # EUROSTOXX -2.7538 0.039 -71.015 0.000 -2.830 -2.678 # ============================================================================== # Omnibus: 1298.794 Durbin-Watson: 2.085 # Prob(Omnibus): 0.000 Jarque-Bera (JB): 24719.733 # Skew: 0.876 Prob(JB): 0.00 # Kurtosis: 14.279 Cond. No. 1.00 # ============================================================================== # Warnings: # [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. # """ model.params # EUROSTOXX -2.753815 # dtype: float64 # 选择100个从最小值到最大值平均分布(equally spaced)的数据点 X_prime = np.linspace(X.min(), X.max(), 100)[:, np.newaxis] # 计算预测值 y_hat = model.predict(X_prime) plt.scatter(X, y, alpha=0.3) # 画出原始数据 plt.xlabel("EURO STOXX 50 returns") plt.ylabel("VSTOXX returns") plt.plot(X_prime, y_hat, 'r', alpha=0.9) # 添加回归线,红色 ![对数收益率散点图和回归线][70 4] EURO STOXX 50和VSTOXX之间的滚动相关 rets.corr() # EUROSTOXX VSTOXX # EUROSTOXX 1.000000 -0.724945 # VSTOXX -0.724945 1.000000 pd.rolling_corr(rets.EUROSTOXX,rets.VSTOXX,window=252).plot(grid=True,style='b') ![EURO STOXX 50和VSTOXX之间的滚动相关][EURO STOXX 50_VSTOXX 2] ## 6.4 高频数据 ## 一个交易日的股价分时数据和交易量 import numpy as np import datetime as dt import tushare as ts import datetime import matplotlib.pyplot as plt plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签 plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号 # 这里用tushare获取了600118中国卫星一周的5分钟分时数据 k_5 = ts.get_k_data(code="600118", start="2018-06-18", end="2018-06-22", ktype='5') k_5.head() # 这个接口返回的数据和查询条件有些对不上,当测试数据还是可以的 # date open close high low volume code # 0 2018-06-12 14:55 20.57 20.57 20.58 20.54 489.0 600118 # 1 2018-06-12 15:00 20.56 20.56 20.57 20.54 1163.0 600118 # 2 2018-06-13 09:35 20.48 20.30 20.48 20.26 2895.0 600118 # 3 2018-06-13 09:40 20.30 20.34 20.39 20.30 1258.0 600118 # 4 2018-06-13 09:45 20.35 20.40 20.45 20.35 535.0 600118 k_5['date'] = k_5['date'].map(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d %H:%M')) # 筛选出一天的数据 plot_data = k_5[['date','close', 'volume']][(k_5['date'] > dt.datetime(2018, 6, 20)) & (k_5['date'] < dt.datetime(2018, 6, 21))] # 横坐标 l = plot_data.index lx = plot_data['date'].map(lambda x: datetime.datetime.strftime(x,'%m-%d %H:%M')) # 画图 fig, (ax1, ax2) = plt.subplots(2, sharex=True, figsize=(8, 6)) # 折线图 ax1.plot(l, plot_data['close'].values) ax1.set_title('中国卫星') ax1.set_ylabel('index level') # 柱状图 ax2.bar(l, plot_data['volume'].values.astype('int'), width=0.5) ax2.set_ylabel('volume') plt.xticks(l, lx, rotation=270) fig.subplots_adjust(bottom=0.3) plt.show() ![一个交易日的股价分时数据和交易量][70 5] [DataFrame]: /images/20220521/9c1d49d803c641869d41ed269fd7b218.png [plot]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html [Series]: /images/20220521/f727fd88187541e593864e5b4c9633e2.png [70]: /images/20220521/f9dc1316762b422aaf116437c05f9f74.png [70 1]: /images/20220521/5b08239c4630474498ae30a33f0141a0.png [70 2]: /images/20220521/70858387e21748cca0767b8a3876180a.png [70 3]: /images/20220521/23f30990bb1d47f0a2a646d398066d30.png [pd.read_csv]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv [EURO STOXX 50_VSTOXX]: /images/20220521/c37954528a7f4b6c9f7227160659d0eb.png [EURO STOXX 50_VSTOXX 1]: /images/20220521/54c5eda216694bc28ac4b3b5bd7b2399.png [70 4]: /images/20220521/31115ad089774edc828073799388ab30.png [EURO STOXX 50_VSTOXX 2]: /images/20220521/4c503cd13cfc4a7986b2a9b156a63ef4.png [70 5]: /images/20220521/5fc7b9a7cd5f4a1ab3d30c8d75d2537a.png
相关 Python金融大数据分析——第10章 推断统计学 笔记3 第10章 推断统计学 10.4 风险测度 10.4.1 风险价值 10.4.2 信用 向右看齐/ 2022年05月21日 12:24/ 0 赞/ 272 阅读
相关 Python金融大数据分析——第10章 推断统计学 笔记1 第10章 推断统计学 10.1 随机数 10.2 模拟 10.2.1 随机变量 心已赠人/ 2022年05月21日 08:30/ 0 赞/ 258 阅读
相关 Python金融大数据分析——第9章 数学工具 笔记 第9章 数学工具 9.1 逼近法 9.1.1 回归 9.1.2 插值 ╰半橙微兮°/ 2022年05月21日 06:45/ 0 赞/ 482 阅读
相关 Python金融大数据分析——第8章 高性能的Pyhon 笔记 第8章 高性能的Python 8.1 Python范型与性能 8.2 内存布局与性能 8.3 并行计算 - 日理万妓/ 2022年05月21日 05:54/ 0 赞/ 404 阅读
相关 Python金融大数据分析——第6章 金融时间序列 笔记 第6章 金融时间序列 6.1 pandas基础 6.1.1 使用DataFrame类的第一步 我会带着你远行/ 2022年05月21日 01:14/ 0 赞/ 482 阅读
相关 Python金融大数据分析——第10章 推断统计学 笔记2 第10章 推断统计学 10.3 估值 10.3.1 欧式期权 10.3.2 美式期权 系统管理员/ 2022年05月20日 20:09/ 0 赞/ 405 阅读
相关 Python金融大数据分析——第16章 金融模型的模拟 笔记 第16章 金融模型的模拟 16.1 随机数生成 16.2 泛型模拟类 16.3 几何布朗运动 野性酷女/ 2022年05月20日 08:18/ 0 赞/ 370 阅读
相关 Python金融大数据分析——第15章 估值框架 笔记 第15章 估值框架 15.1 资产定价基本定理 15.1.1 简单示例 15.1.2 痛定思痛。/ 2022年05月20日 08:17/ 0 赞/ 644 阅读
相关 Python金融大数据分析——第13章 面向对象 笔记 第13章 面向对象和图形用户界面 13.1 面向对象 13.1.1 Python类基础知识 不念不忘少年蓝@/ 2022年05月20日 03:15/ 0 赞/ 351 阅读
相关 Python金融大数据分析——第12章 Excel集成 笔记 第12章 Excel集成 12.1 基本电子表格交互 12.1.1 生成工作簿(.xls) 淡淡的烟草味﹌/ 2022年05月20日 00:46/ 0 赞/ 261 阅读
还没有评论,来说两句吧...