Reading Stock Data
Reading Stock Data (Quotes and Trades)
Outline
• Reading CSV files into Python
• Revisiting Class Templates
• Classes make for a convenient storage container for stocks
• Many stocks are traded, all have same format
• Parsing files using conditional statements
Current Directory
(IPython)
Tools that may be helpful:
• Identify current directory
• PWD or
• os.getcwd()
• Change directory
• Use ‘magic command’ %cd
• Backslash “/” in python
• (PC) %cd “C:/Folder/Folder”
• (Mac)%cd “/Root/Usr/Folder”
• Identify files in directory
• ls
Open file (Method 1)
• Open file
Count is a variable
• Use open() name, not a built-in
method
• Mode (Not always needed)
• mode=‘r’
• mode=‘rb’ (Depends on file)
• Close to release access back
to operating system
Open file (Method 2)
• Use with open()
• Automatically closes the
file when done
• Method 1 appears intuitive
• Method 2 preferred due to
built-in close method.
Class Templates
• If we plan to use a similar object
structure for multiple variables:
• All stocks are similar …
• Use class to create a template
• Allows for dynamic programming
• Otherwise, it would be difficult to
hard-code all possible stocks that
will be read into our program
Class Templates
• If we plan to use a similar object
structure for multiple variables:
• All stocks are similar …
• Use class to create a template
• Allows for dynamic programming
• Otherwise, it would be difficult to
hard-code all possible stocks that
will be read into our program
• Input variable name don’t matter
(But if using generic names, use a
Both class templates above are identical and valid code
comment to help the reader)
Functions
• For multiple reasons, it may be
convenient to define functions
• At a global level
• Can (sometimes) improve script
readability
• Can help with profiling
(Identify which parts are slow)
• At the class level
• If we know that each class is going to
have the same operation
(Will revisit in a future class period) Both code files above are identical and valid
Profiling the data
• Be mindful of what scope you
place print statements
(intentional and unintentional)
• Consider the scripts to the right
and corresponding output:
• print in innermost for loop:
• Prints line for entire file
• print within with open level:
• Prints only once at end of with open
Understanding the Data
• Two Message Types: “MQ” or “TD”
• MQ: Midquote
Structure: “MQ” Symbol Seconds Best Bid BB Shares Best Offer BO Shares
Index: [0] [1] [2] [3] [4] [5] [6]
• TQ: Trade Message
Structure: “TQ” Symbol Seconds Trade Price Trade Size
Index: [0] [1] [2] [3] [4]
Understanding the Data
Trying to profile (incorrectly)
• line is read as a string
• Read values by index won’t work
Profiling (one of many correct ways)
Understanding the Data
From prior slides:
• Midquote records have 7 elements
• Trade records have 5 elements
• Midquote and Trade share the variables
[msgType, Symbol, seconds]
Use an if statement to choose when
to assign other variables
• if Quote:create BB, BBsize, BO, and BOsize
if Trade: create tdPrice and tdSize
Program Objective
• It helps to write out what we want to do with the program:
• Create program that reads a file of stock-trading data for one day
• Calculate summaries for each stock (# of trades, # of quotes, etc.)
• The rows are sequential in time
• We know that the seconds will go from low to high
• No sorting according to stock
• Stocks come in any order
• No guarantee which sticks we will read and which ones we will not
• We can’t static-code the stock symbols
Program Objective
• Two foreseeable obstacles:
• Where do we store the data? A class object
• How do we access and update the above object each time we read
a row for a random stock symbol?
• From last week, know dictionary container has the structure {Key: Value}
• In our problem, we can set
• Key = Stock Symbol (Look up dictionaries according to symbol)
• Value = Class object (The item (i.e. value) we get is the class object)
• Every time we read a line, get the corresponding class Object by
using the lookup key
(Resources for dictionaries: Link, Link, Link)
Program Objective stockHolder
requires an input
of symbol
Starting our program
• stockHolder is the class template
• stockDict is dictionary container
• If we are static coding:
stockDict[“AAPL”]=stockholder(“AAPL”)
• But we want to use dynamic coding for
any symbol that we encounter:
stockDict[symbol]=stockholder(symbol)
• symbol in this program is a variable name we created to identify the
symbol for each row of data
Dynamic Objects
From earlier slides, we know every
line generates a symbol variable
• Using symbol as our dictionary key:
• First time we encounter a stock:
• Create a class-object and input into
dictionary
• {key = symbol : value = object}
• Every time after first:
• Access the object for use in updating data
• stockObject is our dynamic variable for
identifying a stock’s class instance
• Update the object as needed
Dynamic Objects
Report to the summary statistics for each
file by looping through the dictionary
• For every key and value pair:
• Print: symbol (i.e. the key)
• Print items from our value (Object)
• object.trades (key, value) are variable
• object.quotes names we choose.
We could call them (k, v),
• object.tradeVolume or (symbol, object), etc.