Closed
Description
I was playing with some synthetic data in this zipped csv file. It is time series data for four (toy) financial strategies A through D. The rows are records for each day, but weekends are excluded.
matplotlib
seems to be struggling with this data, at least in some circumstances. (Forewarning: I am very new to pandas
and matplotlib
, so I'm sorry if the below is inelegant or mistaken.)
In Python:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df_plot = pd.read_csv("df_plot.csv", index_col="date")
### pandas.DataFrame.pyplot: fast and correct (but mangled axes,
### not matplotlib proper, etc.)
plot = df_plot.plot()
fig = plot.get_figure()
fig.savefig("pandas.DataFrame.pyplot.png")
### matplotlib.pyplot.plot on the raw data: slow beyond belief
plt.figure(figsize=(10,6))
for strategy in ["A", "B", "C", "D"]:
plt.plot(df_plot.index, df_plot[strategy], linestyle="-")
plt.legend()
plt.savefig("matplotlib.plot.raw.png")
### matplotlib.pyplot.plot with the raw data being padded with NAs to
### be pseudo-daily: fast, but it produces broken lines
daily_time = np.arange(np.datetime64(df_plot.index.min()),
np.datetime64(df_plot.index.max()),
np.timedelta64(1, 'D'))
df_tmp = pd.DataFrame(index=pd.Index(daily_time))
df_mpl = df_tmp.join(df_plot)
plt.figure(figsize=(10,6))
for strategy in ["A", "B", "C", "D"]:
plt.plot(df_mpl.index, df_mpl[strategy], linestyle="-")
plt.legend()
plt.savefig("matplotlib.plot.adjoined_daily.png")
In R:
library(tidyverse)
tbl <- read_csv("df_plot.csv")
### ggplot2: fast and correct... ;)
tbl %>%
gather(key = 'strategy', values = -date) %>%
group_by(strategy) %>%
ggplot() +
geom_line(aes(x = date, y = value, colour = strategy)) +
ggsave("ggplot.png", width = 10, height = 6)
Do you know to resolve this issue? Am I missing something?
Metadata
Metadata
Assignees
Labels
No labels