Deep Learning For Options Trading: An End-To-End Approach
Deep Learning For Options Trading: An End-To-End Approach
An End-To-End Approach
                                                               Wee Ling Tan∗                               Stephen Roberts                               Stefan Zohren
                                                     weeling@robots.ox.ac.uk                           sjrob@robots.ox.ac.uk                     stefan.zohren@eng.ox.ac.uk
                                                Oxford-Man Institute of Quantitative            Oxford-Man Institute of Quantitative         Oxford-Man Institute of Quantitative
                                                             Finance                                          Finance                                      Finance
                                                       University of Oxford                             University of Oxford                         University of Oxford
                                                     Oxford, United Kingdom                          Oxford, United Kingdom                        Oxford, United Kingdom
arXiv:2407.21791v1 [q-fin.PM] 31 Jul 2024
                                            ABSTRACT                                                                   Merton [25], regard options as redundant assets. While these ap-
                                            We introduce a novel approach to options trading strategies using          proaches offer a tractable no-arbitrage pricing framework, they
                                            a highly scalable and data-driven machine learning algorithm. In           necessitate thorough specifications of underlying dynamics and
                                            contrast to traditional approaches that often require specifications       require assumptions such as a frictionless market and the ability of
                                            of underlying market dynamics or assumptions on an option pric-            market makers to perfectly hedge their exposures. Practically, these
                                            ing model, our models depart fundamentally from the need for               models are often vulnerable to model misspecification and are usu-
                                            these prerequisites, directly learning non-trivial mappings from           ally too simplistic to sufficiently account for empirically observed
                                            market data to optimal trading signals. Backtesting on more than           variations in option returns [6]. Recent works have challenged this
                                            a decade of option contracts for equities listed on the S&P 100,           framework, demonstrating that option prices are influenced by ad-
                                            we demonstrate that deep learning models trained according to              ditional risks beyond the underlying asset’s exposure [9, 13, 15],
                                            our end-to-end approach exhibit significant improvements in risk-          with several studies documenting the existence of mispricing in the
                                            adjusted performance over existing rules-based trading strategies.         options market [1, 12].
                                            We find that incorporating turnover regularization into the models            Despite the accelerating growth of the options market and the
                                            leads to further performance enhancements at prohibitively high            evidence of options mispricing, there is a surprising lack of research
                                            levels of transaction costs.                                               on a scalable machine learning-based strategy that is able to di-
                                                                                                                       rectly leverage the abundance of data to conduct options trading
                                            KEYWORDS                                                                   on behalf of an active investor. In this work, we address this gap
                                                                                                                       by introducing a novel class of deep learning models capable of
                                            Options, Derivatives, Trading Strategies, Machine Learning, Mo-
                                                                                                                       managing and trading a portfolio of options in a highly data-driven
                                            mentum, Mean-Reversion
                                                                                                                       approach.
                                                                                                                          Previous research has conventionally approached the complexi-
                                            1    INTRODUCTION                                                          ties of managing portfolios of options by developing methods for
                                            Options form a class of derivatives known as contingent claims,            optimal hedging and accurately pricing options. Our model de-
                                            granting one of the counterparties the right to transact an under-         parts fundamentally from the need for these prerequisites, and
                                            lying asset at a certain time in the future and at a specific price.       are directly optimized to learn highly non-trivial mappings from
                                            For instance, the buyer of a call (put) stock option pays a premium        observed data to optimal trading decisions based on risk-adjusted
                                            to the seller for the right to buy (sell) the underlying stock by a        performance. The resulting end-to-end framework does not depend
                                            specified expiration date and strike price. Unlike in cash markets         on specific market dynamics, and can be extended broadly across
                                            that typically exhibit linear payoffs, the ability to generate synthetic   instruments where market data is available.
                                            and highly non-linear exposures has made options an increasingly              To illustrate our approach, we construct the framework by train-
                                            important tool among investors as trading instruments alongside            ing end-to-end neural networks of different complexities and show
                                            other derivatives [4].                                                     that our models are able to outperform existing rules-based strate-
                                               The options market has continued to grow significantly over             gies in managing a portfolio of options, demonstrating strong risk-
                                            the last decade. According to the Options Clearing Corporation,            adjusted returns over a backtest period of over a decade. We subse-
                                            the average daily combined volume of US equity and non-equity              quently extend our models to incorporate turnover regularization
                                            options has steadily risen and roughly tripled over the past decade        during optimization, leading to further performance enhancements
                                            from about 16.3 million in 2013 to 44.2 million contracts in 2023          in the presence of high transaction costs.
                                            [8]. Given the increasing popularity of options trading by market
                                            participants conducted in modern electronic exchanges, there exists
                                            an unprecedented opportunity to engage in large-scale data analysis        2   RELATED WORK
                                            of options trading from the perspective of an active investor.             To place our work within the context of existing research on options,
                                               Classical option pricing models and their parametric variants,          we briefly review the broad literature in three areas: replication and
                                            stemming from the seminal works of Black and Scholes [5] and               hedging, pricing and valuation, and predicting option returns. We
                                                                                                                       subsequently demonstrate how our work contributes to existing
                                            ∗ Corresponding   Author                                                   systematic trading strategies within options markets.
                                                                                                      Wee Ling Tan, Stephen Roberts, and Stefan Zohren
2.1    Options Literature                                                  volatility-based strategies, trend-based strategies often do not re-
Traditionally, options have been extensively studied due to their          quire additional assumptions on any underlying option pricing
significance in both replication and hedging strategies. The works         model or specifications of market dynamics.
of [7] and [22] apply reinforcement learning techniques towards
approximating policies for optimal hedging in the presence of trad-        2.3    Momentum, Mean-Reversion and
ing costs. While these works are pertinent to a product or liquidity              Applications of Machine Learning
provider needing to accurately hedge its exposures to books of             Trend-based strategies consists of primarily momentum and mean-
options and other complex derivatives, our analysis fundamentally          reversion strategies. Momentum strategies operate on the principle
differs in its focus on options trading from the standpoint of an          that asset returns exhibit a tendency to persist in the same direction
active investor seeking to profit from an options trading strategy         and aim to trade in the direction of the trend [20, 26]. On the other
and whose main objective is not predominantly about neutraliz-             hand, mean-reversion strategies adopt an opposite and contrarian
ing exposures. Furthermore, hedging techniques which are based             view, taking on positions that bet on the eventual break down
on reinforcement learning often require generating or simulating           and correction of overextended trends [11, 29]. Recently, machine
possible market paths to arrive at an optimal trading strategy. In         learning models have been increasingly utilized in trend-based
contrast, the solution we offer does not require any assumption or         trading strategies [24, 27, 28, 32, 35, 36].
simulation of market processes, and scales with available historical          While momentum strategies have been extensively documented
data and compute.                                                          in a range of asset classes from cash equities [30] to futures mar-
   There has also been significant emphasis placed on the need to          kets [26], these strategies have received relatively less attention in
accurately price or value an option in order to facilitate optimal         options markets. Heston et al. [15] document strong evidence of
hedging and trading, with notable contributions from the classi-           the presence of momentum in options markets and propose a series
cal Black-Scholes-Merton option pricing model for European-style           of rules-based momentum strategies. However, these rules-based
options [5, 25]. Non-parametric models have also been developed,           strategies often require explicit specifications for the trading rule,
with [18] and [19] using neural networks to approximate the mar-           often with insufficient evidence to justify selecting a certain rule
ket’s option pricing function. Conversely, our framework focuses           over another. In light of these observations, we employ deep neural
on automatically extracting features and making trading decisions          networks to automatically learn risk-adjusted trading rules in a
directly from available data, effectively circumventing the need for       data-driven approach, and provide a thorough comparison between
engineering an option pricing or valuation model.                          these trend-based strategies.
   Several studies have also employed machine learning for predict-
ing option returns, often framing the trading strategy as a standard
                                                                           3     OVERVIEW OF DATASET
regression problem and subsequently determining the direction of
price movements based on forecasted returns. Bali et al. [1] use both      Our dataset consists of option contracts sourced from the Option-
linear and nonlinear models and ex-ante option-based and stock-            Metrics Ivy DB database, comprising end-of-day bid-ask prices,
based characteristics to predict monthly returns of delta-hedged           implied volatility and Greeks of individual options. We indepen-
options, demonstrating high out-of-sample forecast accuracy. How-          dently verify the accuracy of the provided implied volatility and
ever in the case of a trend-following strategy, [24] shows that mod-       Greeks using a binomial tree model of Cox et al. [10] only for the
els that accurately predict returns do not necessarily guarantee a         purpose of initial delta hedging (refer to Section 4), but otherwise
positive or superior strategy performance, given that the overall          we do not assume any underlying option pricing model throughout
profitability of trading strategies is influenced by other factors in-     this work. We focus our analysis on option contracts of equities
cluding the distribution of returns, position sizes and the presence       listed on the S&P 100 Index, as these companies span a wide range
of risk adjustments like volatility targeting [14]. Taking this into       of sectors, representing major large-cap optionable companies in
consideration, our work effectively integrates both trend prediction       the US market, and are associated with higher option liquidity. We
and optimal position sizing simultaneously within a single end-to-         obtain underlying stock prices from the Center for Research in
end function, eliminating the need to forecast option returns in           Security Prices (CRSP), which we use for computing option money-
making trading decisions.                                                  ness and accounting for corporate actions such as stock splits. We
                                                                           perform backtesting with the most recent market data (as of this
                                                                           work) from 2010 to 2023, which notably includes the market selloff
                                                                           following the COVID-19 pandemic.
                                                                               We impose a series of data filters established in the literature to
2.2    Systematic Strategies and Risk Premia in                            ensure the consistency of our analysis. We consider only standard
       Options Markets                                                     monthly option contracts that expire on the third Friday of the
Our work contributes to a series of research documenting system-           month and exclude any special settlements due to corporate actions.
atic strategies originating from studies demonstrating additional          We then exclude options that contain price observations that breach
risk factors [6, 9, 13, 15, 34], and the existence of mispricing [1, 12]   American-style option bounds, having a bid price of zero, or where
within the options market. Considering the broad range of pos-             the ask price is smaller or equal to the bid price, and require options
sible risk factors and strategies, we concentrate our analysis on          to have a positive open interest on the day of portfolio formation.
systematic trend-based strategies. Given their simplicity of being         In addition, we disregard options that contain one or more missing
based primarily on historical price trends and returns, and unlike         observations between the day of portfolio formation and expiration.
Deep Learning for Options Trading: An End-To-End Approach
    On the expiration day of each month and for each stock, we form                 options and therefore ignore the possibility of early assignment of
portfolios of static delta-neutral straddle options by selecting a pair             short options.
of call and put contracts with identical strike prices that are closest                        (𝑖,straddle)
                                                                                      Here, 𝑋𝑡              ∈ [−1, 1] denotes the trading signal or position
to at-the-money (ATM) and expire in the following month. Since                                                                                (𝑖,straddle)
                                                                                    for the straddle option of stock 𝑖 at day 𝑡, and 𝑟𝑡,𝑡 +1                is the
it is usually not possible to select options with moneyness exactly
                                                                                    realized returns of the straddle from 𝑡 to 𝑡 + 1. We find evidence
equal to 1.0 (moneyness of call = 𝑆/𝐾, put = 𝐾/𝑆 where 𝐾 = strike
                                                                                    that straddle options in the cross-section of individual constituents
price, 𝑆 = stock price), we select the pair of options that are closest
                                                                                    of the S&P 100 exhibit different levels of volatility using a Levene’s
to ATM within a moneyness range of 0.95 to 1.05.
                                                                                    test at a significance level of 1%. Given these differences, we include
    Our resulting universe consists of 29984 options that are traded,
                                                                                    volatility targeting at the level of individual straddle options to
with a total of 603068 daily returns observations throughout the                                                   (𝑖,straddle)
backtest period. The returns of straddle options have positive means                scale the realized returns 𝑟𝑡,𝑡 +1          by their volatility in order to
(1.41% monthly), large standard deviations (90.85% monthly), and                    target equal assignments of risk. We set the annualized volatility
                                                                                                                                                      (𝑖,straddle)
significant positive skewness as indicated by a low median (-15.61%                 target 𝜎tgt to be 15% and estimate the ex-ante volatility 𝜎𝑡
monthly). We provide an in-depth explanation on portfolio forma-                    with a 20-day exponentially weighted moving standard deviation
tion and returns computation in Section 4.                                          of daily straddle returns.
                                                                                       All of the following trend-based benchmarks which we incor-
4    SYSTEMATIC OPTIONS TRADING                                                     porate in our work adhere to this general framework and are con-
                                                                                                                                                   (𝑖,straddle)
     STRATEGIES                                                                     cerned with constructing an accurate trading signal 𝑋𝑡                      :
Let 𝑖 = 1, 2, · · · , 𝑁𝑡 denote individual underlying stocks. For a given              Long Only (Short Only). This strategy takes a maximum long
portfolio of (1-month, ATM, static delta-neutral) straddle options                                      (𝑖,straddle)
                                                                                    (short) position 𝑋𝑡              = 1 or −1 for all straddle options in
of these stocks that is rebalanced daily, the overall returns of a
                                                                                    the portfolio. Since the performance of a short only strategy is the
strategy that equally diversifies over 𝑁𝑡 straddles at day 𝑡 can be
                                                                                    exact opposite of a long only strategy, we focus only on the long
expressed as follows:
                                                                                    only strategy.
                          𝑁𝑡
                                                           !
    STRATEGY           1 ∑︁    (𝑖,straddle)      𝜎tgt         (𝑖,straddle)             TSMOM (TSMR). Following the time-series momentum strat-
  𝑟𝑡,𝑡 +1         =          𝑋                               𝑟𝑡,𝑡 +1       (1)
                      𝑁𝑡 𝑖=1 𝑡                (𝑖,straddle)
                                            𝜎      𝑡                                egy (TSMOM) of Moskowitz et al. [26], we adopt the strategy with
                                                                                    a monthly horizon. The position taken for a straddle option is
whereby                                                                             based on the sign of the option’s returns over the past 20 days:
                                                                                      (𝑖,straddle)         (𝑖,straddle)
                                 (𝑖,straddle)          (𝑖,straddle)                 𝑋𝑡             = sgn(𝑟𝑡 −20,𝑡       ). In the time-series mean reversion
             (𝑖,straddle)    𝑝                   − 𝑝𝑡
           𝑟𝑡,𝑡 +1          = 𝑡 +1                                            (2)   strategy (TSMR), we modify TSMOM to take on a negative load-
                                          (𝑖,straddle)                                                                  (𝑖,straddle)          (𝑖,straddle)
                                       𝑝𝑡                                           ing on past returns, where 𝑋𝑡                    = −sgn(𝑟𝑡 −20,𝑡       ). This
             (𝑖,straddle)        (𝑖,call) (𝑖,call)          (𝑖,put) (𝑖,put)         strategy takes a contrarian approach by taking on a long (short)
           𝑝𝑡              = 𝑤 norm 𝑝𝑡           + 𝑤 norm 𝑝𝑡
                                                                                    position for straddles with negative (positive) returns over the past
                  (𝑖,call)          𝑤 (𝑖,call)                                      20 days.
                𝑤 norm =
                             𝑤  (𝑖,call) + 𝑤 (𝑖,put)
                  (𝑖,put)             (𝑖,call)                                         MACD (MACDMR). We use volatility normalised moving aver-
                𝑤 norm = 1 − 𝑤 norm
                                                                                    age convergence divergence (MACD) indicators based on Baz et al.
                                  (𝑖,put)
                𝑤 (𝑖,call) =   −Δ0                                                  [2] in place of the sign of returns for estimating the trading signal:
                                (𝑖,call)
                𝑤 (𝑖,put) =    Δ0
                                                                                              MACD(𝑖, 𝑡, 𝑆, 𝐿) = 𝑚(𝑖, 𝑡, 𝑆) − 𝑚(𝑖, 𝑡, 𝐿)
On the day of portfolio formation (𝑡 = 0), we construct static delta-                                                    MACD(𝑖, 𝑡, 𝑆, 𝐿)
neutral straddle options by holding the call and put options with                             MACDnorm (𝑖, 𝑡, 𝑆, 𝐿) =
                                                                                                                             std(𝑝𝑡 −5:𝑡 )
              (𝑖,put)        (𝑖,call)
weights −Δ0           and Δ0          respectively, with Δ0 representing                        (𝑖,straddle)          MACD     norm (𝑖, 𝑡, 𝑆, 𝐿)
initial deltas. We subsequently normalize both weights to sum to                              𝑌𝑡              =
                                                                                                                std(MACDnorm (𝑖, 𝑡 − 20 : 𝑡, 𝑆, 𝐿))
one, resulting in a respective weightage of the call and put options                                               3
that is generally close to 50-50. In evaluating the price of the straddle,                       (𝑖,straddle)   1 ∑︁      (𝑖,straddle)
                                                                                              𝑋𝑡              =      𝜙 (𝑌𝑡             (𝑆𝑘 , 𝐿𝑘 ))            (3)
           (𝑖,call)      (𝑖,put)                                                                                3
we take 𝑝𝑡          and 𝑝𝑡        to be the bid-ask midpoints of the call                                        𝑘=1
and put options.
   Consistent with other works [13, 15], we focus on static delta-                  where MACD(𝑖, 𝑡, 𝑆, 𝐿) denotes the MACD value of the straddle
neutral straddles as these instruments are on average invariant to                  option of stock 𝑖 at day 𝑡 with a short time scale 𝑆 and long time
movements in the underlying. [33] demonstrate that performing                       scale 𝐿. 𝑚(𝑖, 𝑡, 𝑗) is defined as the exponentially weighted moving
one-time delta hedging at initiation neutralizes the overall direc-                 average of the straddle’s prices at day 𝑡, with a time scale 𝑗 corre-
tional risks associated with an option by about 70%. As such, we opt                sponding to a half-life of 𝐻𝐿 = log(0.5)/log(1 − 1/𝑗). We combine
for delta hedging at initiation and following similar works, we dis-                MACD signals in an equally weighted sum over multiple short
regard the early exercise premium embedded in the American-style                    and long time scales 𝑆𝑘 ∈ {2, 4, 8}, 𝐿𝑘 ∈ {8, 16, 32} to construct a
                                                                                                                   Wee Ling Tan, Stephen Roberts, and Stefan Zohren
                                                               −𝑦 2
             (𝑖,straddle)              𝑦 exp( 4 )                            model 𝑓 with trainable parameters 𝜽 :
position 𝑋𝑡                    where 𝜙 (𝑦) =
                                           0.89   . Likewise, we con-
                            (𝑖,straddle)                                                                 (𝑖,straddle)            (𝑖,straddle)
sider both the momentum 𝑋𝑡               (MACD) and mean reversion                                      𝑋𝑡              = 𝑓 (u𝑡                 ;𝜽)                      (7)
    (𝑖,straddle)
−𝑋𝑡              (MACDMR) strategies. We refer the reader to [2] for
                                                                                                                                     (𝑖,straddle)
further details on the strategy implementation.                                 In this framework, trading signals 𝑋𝑡               are directly com-
                                                                             puted using the point-in-time snapshot of the option’s features,
   TSHestonMOM (TSHestonMR). Following Heston et al. [15],
                                                                             integrating both trend prediction and optimal position sizing within
we consider a strategy that constructs the trading signal for the
                                                                             a single function 𝑓 . Unlike standard supervised learning paradigms,
straddle option of stock 𝑖 on the day of portfolio formation (𝑡 = 0),
                                                                             a distinctive characteristic of our end-to-end framework is the un-
holding the position unchanged to expiry:
                                                                             availability of ground-truth labels for the optimal trading signal
                          (𝑖,straddle)          (𝑖,straddle)                   (𝑖,straddle)
                        𝑌0:𝑡             = 𝑟 0−𝑛M,0                   (4)    𝑋𝑡             of a given option at any point in time. This necessitates
                          (𝑖,straddle)               (𝑖,straddle)            the learning of a non-trivial mapping from an option’s features to
                        𝑋 0:𝑡            =     sgn(𝑌0:𝑡           )   (5)    optimal trading signals. We discuss the choice of architectures and
         (𝑖,straddle)                                                        optimization of these models in the following sections.
where 𝑟 0−𝑛M,0      is the average returns for the series of (1-month,
ATM, static delta-neutral) straddle options for stock 𝑖 over a look-
back period of 𝑛 months from the day of portfolio formation. A
                                                                             5.2      Network Architectures
key distinction between TSMOM and TSHestonMOM is that the                    Given the problem of learning a mapping from option features to
momentum signals identified in the latter strategy are more accu-            trading signals, it is not immediately clear which choice of archi-
rately referred to as cross-serial correlations, in that the options         tecture would best suit an end-to-end model, warranting the need
                                                  (𝑖,straddle)
used to compute the 𝑛-period average returns 𝑟 0−𝑛M,0          are differ-   to consider multiple options. Taking this into account, we examine
                                                                             various choices of neural networks for 𝑓 .
ent from those used to construct the trading signal for the options
to be traded following 𝑡 = 0. In other words for the TSHeston-                 Linear. Beginning with the elementary case of a neural network
MOM, all types of momentum signals involve past returns of a                 comprising of a single fully connected layer:
set of options predicting the future returns on an entirely new
                                                                                                     (𝑖,straddle)                 (𝑖,straddle)
set. Similar to [15], we consider multiple strategies corresponding                                𝑋𝑡               = 𝑔(W⊤ u𝑡                     + 𝑏)                   (8)
to average returns of monthly straddle returns over various look-
back periods ranging from 𝑛 = 1, 3, 6, 12 corresponding to monthly,                                      (𝑖,straddle)        (𝑖,straddle)
                                                                             where W ∈ R𝑚 , u𝑡              B u𝑡 −𝜏+1:𝑡      ∈ R𝑚 with 𝑚 = 𝜏 𝑑,
quarterly, semiannual and annual returns. We again consider both             𝑏 ∈ R and 𝑔 = tanh is the activation function. The model computes a
                    (𝑖,straddle)
the momentum 𝑋 0:𝑡               (TSHestonMOM) and mean reversion            linear combination of the input features prior to the tanh activation
    (𝑖,straddle)                                                             function that outputs the trading signal to be within the finite
−𝑋 0:𝑡             (TSHestonMR) strategies.
                                                                             range of [−1, 1]. To factor in the immediate temporal history of
   CSHestonMOM (CSHestonMR). Based on the long-short ap-                     observations for making predictions, we concatenate features from
proach of [15] and [20], we implement a cross-sectional momentum             the past 𝜏 = 5 days from time 𝑡 into a single input vector.
strategy that scores and ranks a stock based on its average straddle
returns computed as per TSHestonMOM. On the day of portfolio for-              Multilayer Perceptron (MLP). We include an additional hid-
mation, the strategy utilizes a high-minus-low decile portfolio, tak-        den layer to the Linear model, enhancing the model’s depth:
ing a maximum long and short position for the top and bottom 10%
                                                                                    (𝑖,straddle)                                  (𝑖,straddle)
of ranked stocks respectively and holding the positions to expiry.             𝑋𝑡                  = 𝑔[W [2]⊤ 𝜎 (W [1]⊤ u𝑡                       + 𝑏 [1] ) + 𝑏 [2] ]     (9)
                                            (𝑖,straddle)     (𝑖,straddle)
We first calculate raw momentum scores 𝑌0:𝑡              = 𝑟 0−𝑛M,0
                                                                             with 𝑔 = 𝜎 = tanh corresponding to the activation functions of
according to Equation (4). Then, we rank stocks by their raw mo-             each layer.
mentum scores:
                                (𝑖,straddle)
                            𝑋 0:𝑡               = {+1, −1, 0}         (6)       Convolutional Neural Networks (CNN). Modified for time
                                                                             series data, CNNs have been designed to incorporate causal convolu-
          (𝑖,straddle)                                                       tions that utilize only past information for forecasting [3], maintain-
where 𝑋 0:𝑡       = +1 for stocks ranked in the top 10% and −1 for
the bottom 10% ranked stocks, and 0 otherwise. We examine both               ing the autoregressive ordering of temporal features. We consider
                 (𝑖,straddle)                                                a 1-D autoregressive CNN:
the momentum 𝑋 0:𝑡            (CSHestonMOM) and mean reversion
    (𝑖,straddle)
−𝑋 0:𝑡             (CSHestonMR) strategies.                                    h𝑡
                                                                                   (𝑖,straddle)
                                                                                                   = 𝑃𝜎 [W𝑐
                                                                                                             [2]
                                                                                                                   ∗ 𝜎 (W𝑐
                                                                                                                           [1]
                                                                                                                                 ∗ u𝑡
                                                                                                                                     (𝑖,straddle)        [1]
                                                                                                                                                      + b𝑐 ) + b𝑐 ]
                                                                                                                                                                   [2]
                                                                                   (𝑖,straddle)                                  (𝑖,straddle)
5 DEEP LEARNING FOR OPTIONS TRADING                                            𝑋𝑡                  = 𝑔[W [2]⊤ 𝜎 (W [1]⊤ h𝑡                       + 𝑏 [1] ) + 𝑏 [2] ] (10)
5.1 General End-To-End Framework                                                          [𝑙 ]                                                                           [𝑙 ]
                                                                             where W𝑐 represent convolutional kernels with bias terms b𝑐
We frame the problem of generating optimal trading decisions                 and activation functions 𝑔 = 𝜎 = tanh, and ∗ represents the causal
  (𝑖,straddle)
𝑋𝑡             for a portfolio of options with a model 𝑓 as an end-          convolution operator. Subsequently, we perform average pooling
to-end framework. Given time 𝑡 and a straddle option of stock 𝑖,             with 𝑃 prior to passing the activations h𝑡 to a fully connected neural
                      (𝑖,straddle)
we have an input u𝑡                ∈ R𝑑 of option features. We learn a       network.
Deep Learning for Options Trading: An End-To-End Approach
                                                                                                                                     (𝑖,straddle)     (𝑖,straddle) √
   Long Short-term Memory (LSTM). Recurrent neural networks                                I. Normalized Returns – we use 𝑟𝑡 −𝑘,𝑡                 /(𝜎𝑡              𝑘),
(RNNs) have traditionally found applications in sequence modelling                            representing straddle returns normalized by daily volatil-
and time series forecasting [23]. Given the sequential nature of our                          ity estimates scaled to a time scale 𝑘 ∈ {1, 5, 10, 15, 20},
prediction task, it is natural to consider recurrent architectures. We                        which corresponds to daily, weekly, biweekly, triweekly
implement a single layer LSTM model [16] that takes in an input                               and monthly returns.
                                  (𝑖,straddle)    (𝑖,straddle)
sequence of option features u𝑡                 B u𝑡 −𝜏+1:𝑡     ∈ R𝑚 with                  II. MACD Indicators – we take volatility normalised MACD
𝑚 = 𝜏 𝑑 where 𝜏 represents the length of a trajectory, and subdivide                                     (𝑖,straddle)
                                                                                              signals 𝑌𝑡              (𝑆𝑘 , 𝐿𝑘 ) from Equation (3) with short
the time series into trajectories of 𝜏 = 20 during backpropagation.                           and long time scales 𝑆𝑘 ∈ {2, 4, 8} and 𝐿𝑘 ∈ {8, 16, 32}.
We omit the technical equations for the LSTM architecture for                            III. Option Momentum Features – we expand our set of
brevity.                                                                                      predictors to include the momentum features as defined in
                                                                                              Equation (4), taking the average returns of straddle options
5.3     Training Details                                                                      for the stock over lookback periods of 𝑛 = 1, 3, 6, 12 months.
5.3.1 Loss Function. To facilitate the learning of a non-trivial map-                    IV. Core Option Features – to facilitate comparability with
ping from option features to optimal trading signals that effectively                         the trend-based benchmarks outlined in Section 4, and to
balances both risk and reward, we directly calibrate the models us-                           focus on the predictive power of the features above, we
ing the Sharpe ratio [31], a risk-adjusted performance metric. Given                          maintain a parsimonious set of core option features which
a set of contemporaneous option features and their respective trad-                           includes the log-moneyness (of both call and put options
ing signals DΩ = {(u𝑡
                        (𝑖,straddle)
                                       , 𝑋𝑡
                                           (𝑖,straddle)        (𝑖,straddle)
                                                         = 𝑓 (u𝑡            ; 𝜽 ))}           forming the straddle) and days to expiry (DTE, in years).
with Ω = {(𝑖, 𝑡) | 𝑖 = 1, · · · , 𝑁 , 𝑡 = 1, · · · ,𝑇 } denoting all straddle-
                                    𝑡                                                         Given that the moneyness and DTE of a straddle option
time pairs, we define the loss Lsharpe (𝜽 ) over DΩ as the annualized                         changes over time, these core contract features are nec-
Sharpe ratio:                                                                                 essary in identifying the option at particular stages of its
                                                        √                                     lifespan. Distinctively, we exclude other features such as
                                    1 Í
                                  | Ω | Ω 𝑅𝑖 (𝑡) × 252                                        option implied volatility or sensitivity measures such as
      Lsharpe (𝜽 ) = − √︂                                                      (11)           Greeks which would require making an assumption of an
                                                   h                i2
                             1 Í 𝑅 (𝑡) 2 − 1 Í 𝑅 (𝑡)                                          underlying option pricing model such as the Black-Scholes
                           | Ω| Ω 𝑖                  |Ω | Ω 𝑖
                                                                                              or the binomial model. Given our focus on delta-neutral
                                                          !                                   straddles, we also exclude underlying stock characteristics
                          (𝑖,straddle)        𝜎tgt             (𝑖,straddle)
             𝑅𝑖 (𝑡) =   𝑋𝑡                                    𝑟𝑡,𝑡 +1         (12)            and stock returns from our core set of features.
                                           (𝑖,straddle)
                                         𝜎𝑡
5.3.2 Optimization. Within each in-sample window, we perform a                        6.3    Results and Discussion
train-validation split with the earlier 90% of data used for calibrat-                We use the following annualized metrics to evaluate the out-of-
ing the models and the most recent 10% reserved for validation. To                    sample performance of all strategies:
calibrate the models, we perform backpropagation using minibatch
stochastic gradient descent with Adam [21], and trigger early stop-                        I. Profitability Measures – Expected Returns (E[Returns]),
ping with a patience of 25 epochs based on the validation loss. In                            Hit Rate
order to select optimal candidates for each machine learning model,                       II. Risk Measures – Volatility (Vol.), Downside Deviation,
we conduct hyperparameter optimization with 100 iterations of                                 Maximum Drawdown (MDD)
random search. We refer the reader to Appendix A for the detailed                        III. Performance Ratios – Sharpe,
                                                                                                                          Sortino
                                                                                                                                  and Calmar Ratios,
description of hyperparameter search ranges. Model calibration                                Average Profit over Loss Ave. P
                                                                                                                       Ave. L
was performed on a server equipped with an AMD EPYC7713 CPU
and multiple NVIDIA L40 GPUs.                                                             We present the aggregated out-of-sample performance metrics of
                                                                                      all strategies computed using the overall returns according to Equa-
6 PERFORMANCE EVALUATION                                                              tion (1). Firstly, we present the performance of all strategies from
6.1 Backtest Details                                                                  their raw signal outputs in Table 1. We then apply to all strategies
                                                                                      (excluding unprofitable strategies) an additional layer of volatility
Following an expanding window approach, we train all models with
                                                                                      scaling to target an annualized volatility of 15% at the portfolio level
every block of 5 additional years. In each block, we fix the weights
                                                                                      and report the performance in Table 2 and plot their cumulative
and hyperparameters of the trained models and evaluate the models
                                                                                      returns in Figure 1. This adjustment at the portfolio level facilitates
out-of-sample in the following 5-year window. We perform model
                                                                                      comparison between individual strategy returns in line with our
calibration over multiple seeded runs and present the aggregated
                                                                                      15% volatility target. For each Heston portfolio (TSHestonMOM,
out-of-sample results in Section 6.3.
                                                                                      TSHestonMR, CSHestonMOM, CSHestonMR), we report results
                                                                                      for the best performing lookback period for the sake of brevity. In
6.2     Option Features                                                               this section, we report the performance of all strategies without
                                                          (𝑖,straddle)
To construct an input of option features u𝑡            ∈ R𝑑 as de-                    factoring in transaction costs to evaluate their raw predictive ability.
scribed in Equation (7), we include a combination of the predictors                   In Section 6.4, we include an analysis of the impact of transaction
used in the strategies outlined in Section 4:                                         costs and the effect of turnover regularization.
                                                                                                           Wee Ling Tan, Stephen Roberts, and Stefan Zohren
                   Transaction Costs (bps)            0.0       1.0       2.0        3.0       5.0      10.0      20.0       50.0
                   Benchmarks
                   Long Only                          0.697     0.695     0.692      0.690     0.685    0.673     0.650      0.578
                   TSMR                               0.762     0.757     0.752      0.747     0.737    0.713     0.663      0.514
                   MACDMR                             0.655     0.646     0.638      0.629     0.612    0.568     0.481      0.219
                   TSHestonMR                         0.626     0.621     0.615      0.610     0.600    0.574     0.522      0.367
                   CSHestonMR                         0.573     0.570     0.567      0.563     0.557    0.541     0.508      0.410
                   Deep Learning Models
                   LSTM                               1.329*    1.310*    1.291*     1.272*    1.235*   1.140     0.952      0.388
                   LSTM + TC Reg.                     1.282     1.270     1.259      1.247     1.223    1.164*    1.045*     0.689*
Deep Learning for Options Trading: An End-To-End Approach
                                                    10.0
                                                                                                                                           Long Only
                                                                                                                                           Linear
                                                                                                                                           MLP
                                                     6.3                                                                                   CNN
                                                                                                                                           LSTM
                                                                                                                                           TSMR
                                                                                                                                           MACDMR
                   Cumulative Returns (Log Scale)
                                                                                                                                           TSHestonMR
                                                     4.0                                                                                   CSHestonMR
2.5
1.6
1.0
   From Table 1, we find that the Long Only straddle portfolio was                                clearly from Figure 1 that Long Only exhibited sharp gains during
a profitable strategy over the backtest period. We find this observa-                             the COVID-19 market selloff at the start of 2020, demonstrating
tion interesting, running contrary to [17] who show that retail and                               the profitability of long straddles during periods of high market
institutional investors executing short volatility strategies tend to                             volatility. While we observe that the deep learning models expe-
perform well. Furthermore, a Long Only options portfolio would                                    rienced brief drawdowns during the selloff, performance of the
typically benefit from limited downside exposures as opposed to                                   models swiftly recovered following the market rebound.
Short Only. Turning our attention to trend-based strategies, we
observe that mean-reversion portfolios (TSMR, MACDMR, TSHe-                                       6.4    Transaction Costs and Turnover
stonMR, CSHestonMR) exhibit positive performances compared to                                            Regularization
their opposite momentum counterparts (TSMOM, MACD, TSHe-
stonMOM, CSHestonMOM), which were generally unprofitable                                          Some of the key challenges of rebalancing an options portfolio
over the backtest period. We note that both TSMR and CSHestonMR                                   include market microstructure considerations arising from market
exhibited only slight performance improvements over Long Only.                                    liquidity and bid-ask spreads, which can result in high transaction
We obtain similar results for the Heston portfolios as reported in                                costs. To examine the impact of transaction costs on the profitabil-
[15], who document significant reversals in option returns at short-                              ity of all strategies, we first compute Sharpe ratios adjusted for
term horizons, and our best performing Heston models correspond                                   transaction costs by taking into account turnover-adjusted returns:
mostly with strategies adopting the shortest 𝑛 = 1 month lookback                                                     𝑁𝑡                      (𝑖,straddle)    (𝑖,straddle) 
period. Moving on to our deep learning models, apart from the                                                     1 ∑︁                      𝑋                𝑋
                                                                                                    STRATEGY
                                                                                                  𝑟˜𝑡,𝑡 +1   =             𝑅𝑖 (𝑡) − 𝑐 · 𝜎tgt 𝑡              − 𝑡 −1
CNN, we observe that the Linear, MLP and LSTM exhibit a clear                                                    𝑁𝑡 𝑖=1                        (𝑖,straddle)    (𝑖,straddle)
                                                                                                                                             𝜎𝑡              𝜎𝑡 −1
disparity in performance above all other strategies as seen from                                                                                                          (13)
their higher performance ratios.                                                                  where 𝑐 represents a measure of average transaction costs in basis
   Referring to Table 2, with the implementation of volatility tar-                               points. Focusing on the LSTM model, we observe from Table 3
geting at the portfolio level, we observe modest improvements in                                  that the model maintains superior risk-adjusted performance over
performance across all profitable benchmarks. Portfolio volatility                                the best performing benchmark - TSMR up to transaction costs of
targeting resulted in minimal changes to the deep learning models,                                𝑐 = 20 bps, deteriorating at higher transaction costs of 𝑐 = 50 bps.
allowing them to retain a large gap in their performance ratios                                      Following the methodology detailed in [24, 32], we modify the
over the benchmarks. In particular, we see that the Linear and                                    training loss to utilize turnover-adjusted returns as defined in Equa-
LSTM models exhibited the best performances, outperforming the                                    tion (13), which in effect results in optimizing the Sharpe ratio while
benchmarks by roughly twice in their Sharpe ratios of 1.290 and                                   regularizing for the turnover generated by the trading signals of the
1.329 respectively. We note that the simplest Linear model was able                               LSTM. Based on Table 3, we see that turnover regularization further
to perform the MLP, most likely due to our introduction of an L1                                  enhances the performance of the LSTM for high transaction costs
regularization penalty only for the Linear model during training.                                 of 𝑐 = 10 to 50 bps, allowing the regularized model to outperform
Examining the period during the COVID-19 market selloff, we see                                   other strategies at prohibitively high levels of transaction costs.
                                                                                                                             Wee Ling Tan, Stephen Roberts, and Stefan Zohren
7     CONCLUSIONS                                                                               8.1735
                                                                                           [17] Jianfeng Hu, Antonia Kirilova, Seongkyu Park, and Doojin Ryu. 2023. Who
We present a general end-to-end framework for trading options                                   Profits from Trading Options? Management Science 70, 7 (2023), 4742–4761.
using a highly data-driven machine learning algorithm, adopting                                 https://doi.org/10.1287/mnsc.2023.4916
                                                                                           [18] James M Hutchinson, Andrew W Lo, and Tomaso Poggio. 1994. A Nonparametric
the point of view of an active investor seeking to profit from options                          Approach to Pricing and Hedging Derivative Securities Via Learning Networks.
trading. Departing from conventional approaches that typically rely                             The Journal of Finance 49, 3 (1994), 851–889. https://doi.org/10.2307/2329209
on specific market dynamics or option pricing models, we train end-                        [19] Codrut, -Florin Ivas, cu. 2021. Option Pricing using Machine Learning. Expert
                                                                                                Systems with Applications 163 (2021), 113799. https://doi.org/10.1016/j.eswa.
to-end neural networks to directly learn mappings from options                                  2020.113799
data to optimal trading positions, removing the need to simulate                           [20] Narasimhan Jegadeesh and Sheridan Titman. 1993. Returns to Buying Winners
market processes, price options or predict option returns. Back-                                and Selling Losers: Implications for Stock Market Efficiency. The Journal of
                                                                                                Finance 48, 1 (1993), 65–91. https://doi.org/10.1111/j.1540-6261.1993.tb04702.x
testing our approach on portfolios of delta-neutral equity options,                        [21] Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-
our models demonstrate significant improvements in risk-adjusted                                mization. arXiv:1412.6980 (2014).
                                                                                           [22] Petter N Kolm and Gordon Ritter. 2019. Dynamic Replication and Hedging: A
performance when calibrated using the Sharpe ratio. Crucially,                                  Reinforcement Learning Approach. The Journal of Financial Data Science 1, 1
our framework is agnostic to specific underlying market assump-                                 (2019), 159–171. https://doi.org/10.3905/jfds.2019.1.1.159
tions, potentially allowing for further extensions to a broader set                        [23] Bryan Lim and Stefan Zohren. 2021. Time Series Forecasting With Deep Learning:
                                                                                                A Survey. Philosophical Transactions of the Royal Society A 379, 2194 (2021),
of derivatives or complex instruments where data is available.                                  20200209. https://doi.org/10.1098/rsta.2020.0209
                                                                                           [24] Bryan Lim, Stefan Zohren, and Stephen Roberts. 2019. Enhancing Time-Series
                                                                                                Momentum Strategies Using Deep Neural Networks. The Journal of Financial
ACKNOWLEDGMENTS                                                                                 Data Science 1, 4 (2019), 19–38. https://doi.org/10.3905/jfds.2019.1.015
We would like to thank the Oxford-Man Institute of Quantitative                            [25] Robert C Merton. 1973. Theory of Rational Option Pricing. The Bell Journal of
                                                                                                Economics and Management Science 4, 1 (1973), 141–183. https://doi.org/10.2307/
Finance for providing compute resources. Wee Ling Tan thanks                                    3003143
Bryan Lim, Martin Luk, Guillaume Andrieux and Hans Buehler for                             [26] Tobias J Moskowitz, Yao Hua Ooi, and Lasse Heje Pedersen. 2012. Time Series
                                                                                                Momentum. Journal of Financial Economics 104, 2 (2012), 228–250. https:
their insightful comments.                                                                      //doi.org/10.1016/j.jfineco.2011.11.003
                                                                                           [27] Daniel Poh, Bryan Lim, Stefan Zohren, and Stephen Roberts. 2021. Building Cross-
                                                                                                Sectional Systematic Strategies by Learning to Rank. The Journal of Financial
REFERENCES                                                                                      Data Science 3, 2 (2021), 70–86. https://doi.org/10.3905/jfds.2021.1.060
 [1] Turan G Bali, Heiner Beckmeyer, Mathis Moerke, and Florian Weigert. 2023.             [28] Daniel Poh, Bryan Lim, Stefan Zohren, and Stephen Roberts. 2022. Enhancing
     Option Return Predictability with Machine Learning and Big Data. The Review                Cross-Sectional Currency Strategies by Context-Aware Learning to Rank with
     of Financial Studies 36, 9 (2023), 3548–3602. https://doi.org/10.1093/rfs/hhad017          Self-Attention. The Journal of Financial Data Science 4, 3 (2022), 89–107. https:
 [2] Jamil Baz, Nicolas Granger, Campbell R Harvey, Nicolas Le Roux, and Sandy                  //doi.org/10.3905/jfds.2022.1.099
     Rattray. 2015. Dissecting Investment Strategies in the Cross Section and Time         [29] James M Poterba and Lawrence H Summers. 1988. Mean Reversion in Stock
     Series. SSRN 2695101 (2015).                                                               Prices: Evidence and Implications. Journal of Financial Economics 22, 1 (1988),
 [3] Mikolaj Binkowski, Gautier Marti, and Philippe Donnat. 2018. Autoregressive                27–59. https://doi.org/10.1016/0304-405X(88)90021-9
     Convolutional Neural Networks for Asynchronous Time Series. In International          [30] K Geert Rouwenhorst. 1998. International Momentum Strategies. The Journal of
     Conference on Machine Learning. PMLR, 580–589.                                             Finance 53, 1 (1998), 267–284.
 [4] Fischer Black. 1975. Fact and Fantasy in the Use of Options. Financial Analysts       [31] William F Sharpe. 1994. The Sharpe Ratio. The Journal of Portfolio Management
     Journal 31, 4 (1975), 36–41. https://doi.org/10.2469/faj.v31.n4.36                         21, 1 (1994), 49–58. https://doi.org/10.3905/jpm.1994.409501
 [5] Fischer Black and Myron Scholes. 1973. The Pricing of Options and Corporate           [32] Wee Ling Tan, Stephen Roberts, and Stefan Zohren. 2023. Spatio-Temporal
     Liabilities. Journal of Political Economy 81, 3 (1973), 637–654. http://www.jstor.         Momentum: Jointly Learning Time-Series and Cross-Sectional Strategies. The
     org/stable/1831029                                                                         Journal of Financial Data Science 5, 3 (2023), 107–129. https://doi.org/10.3905/
 [6] Matthias Büchner and Bryan Kelly. 2022. A Factor Model for Option Returns.                 jfds.2023.1.130
     Journal of Financial Economics 143, 3 (2022), 1140–1161. https://doi.org/10.1016/     [33] Meng Tian and Liuren Wu. 2023. Limits of Arbitrage and Primary Risk-Taking
     j.jfineco.2021.12.007                                                                      in Derivative Securities. The Review of Asset Pricing Studies 13, 3 (2023), 405–439.
 [7] Hans Buehler, Lukas Gonon, Josef Teichmann, and Ben Wood. 2019. Deep                       https://doi.org/10.1093/rapstu/raad003
     Hedging. Quantitative Finance 19, 8 (2019), 1271–1291. https://doi.org/10.1080/       [34] Aurelio Vasquez. 2017. Equity Volatility Term Structures and the Cross Section
     14697688.2019.1571683                                                                      of Option Returns. Journal of Financial and Quantitative Analysis 52, 6 (2017),
 [8] Options Clearing Corporation. 2024.            OCC - Historical Volume Statis-             2727–2754. https://doi.org/10.1017/S002210901700076X
     tics. https://www.theocc.com/market-data/market-data-reports/volume-and-              [35] Kieran Wood, Sven Giegerich, Stephen Roberts, and Stefan Zohren. 2023. Trading
     open-interest/historical-volume-statistics.                                                with the Momentum Transformer: An Intelligent and Interpretable Architecture.
 [9] Joshua D Coval and Tyler Shumway. 2001. Expected Option Returns. The Journal               arXiv:2112.08534, Risk (2023).
     of Finance 56, 3 (2001), 983–1009. https://doi.org/10.1111/0022-1082.00352            [36] Kieran Wood, Stephen Roberts, and Stefan Zohren. 2022. Slow Momentum
[10] John C Cox, Stephen A Ross, and Mark Rubinstein. 1979. Option Pricing: A                   with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint
     Simplified Approach. Journal of Financial Economics 7, 3 (1979), 229–263. https:           Detection. The Journal of Financial Data Science 4, 1 (2022), 111–129. https:
     //doi.org/10.1016/0304-405X(79)90015-1                                                     //doi.org/10.3905/jfds.2021.1.081
[11] Werner FM De Bondt and Richard Thaler. 1985. Does the Stock Market Overreact?
     The Journal of Finance 40, 3 (1985), 793–805. https://doi.org/10.2307/2327804
[12] Assaf Eisdorfer, Ronnie Sadka, and Alexei Zhdanov. 2022. Maturity Driven              A     HYPERPARAMETER OPTIMIZATION
     Mispricing of Options. Journal of Financial and Quantitative Analysis 57, 2 (2022),
     514–542. https://doi.org/10.1017/S002210902100003X
[13] Amit Goyal and Alessio Saretto. 2009. Cross-section of Option Returns and
                                                                                                          Table 4: Hyperparameter Search Range
     Volatility. Journal of Financial Economics 94, 2 (2009), 310–326. https://doi.org/
     10.1016/j.jfineco.2009.01.001
[14] Campbell R Harvey, Edward Hoyle, Russell Korgaonkar, Sandy Rattray, Matthew                Hyperparameters                Search Grid
     Sargaison, and Otto Van Hemert. 2018. The Impact of Volatility Targeting. The
     Journal of Portfolio Management 45, 1 (2018), 14–33. https://doi.org/10.3905/jpm.          Minibatch Size                 32, 64, 128, 256
     2018.45.1.014                                                                              Dropout Rate                   0.1, 0.2, 0.3, 0.4, 0.5
[15] Steven L Heston, Christopher S Jones, Mehdi Khorram, Shuaiqi Li, and Haitao
     Mo. 2023. Option Momentum. The Journal of Finance 78, 6 (2023), 3141–3192.
                                                                                                Hidden Layer Size              5, 10, 20, 40, 80, 160
     https://doi.org/10.1111/jofi.13279                                                         Learning Rate                  10 −5, 10 −4, 10 −3, 10 −2, 10 −1, 100
[16] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory.                      Max Gradient Norm              10 −4, 10 −3, 10 −2, 10 −1, 100, 101
     Neural Computation 9, 8 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.