Get It Done in 5 seconds!
Are you bored of doing same stuff again?
Feeling your life is just doing the same thing over and over again?
Here is the thing, today I am going to introduce a tool to automate your BORING stuff — Python. Python is perhaps the most easiest language to learn. Because of your acquired Python skill, you will be able not only to increase your productivity, but also focus on work which you will be more interested in.
I will use an example, paper trading in Singapore stock market as an illustration on how automation could be done. Paper trading allow you to practice investing or trading using virtual money before you really put real money in it. This is a good way to start as to prove whether your strategy works.
This is the agenda which I will be sharing:
Part 1 — Input the stock code and amount which you want trade in a text file.
Part 2 — How to do Web Scraping on your own, the full journey.
Part 3 — Clean and tabulate data.
Part 4— Output the result into a csv or excel file.
Follow the whole journey and you will notice how simple it is to automate your boring stuff and to update your price IN 5 Seconds.
Part 1— Input the stock code and amount which you want trade in a text file
Launch a new text file, enter the stock code and the price you will buy given the particular stock, separated by a comma.
Launch a new text file, enter the stock code and the price you will buy given the particular stock, separated by a comma as shown.
Part 2 — How to do Web Scraping on your own, the full journey
This is a snapshot of the SGX website.
I am going to illustrate how to scrape all trading information contain in this table. Do open a google chrome, right click on the website and you will be able to see the below snapshot.
Click on the inspect button, then click on the network tab (top right corner of the below snapshot as highlighted in purple bracket).
Next, click on the row as highlighted in purple box and then choose preview as shown in the highlighted green box, both shown in Snapshot 4 below.
So you can see from the Preview, all the data are contained in JSON format. Next, click on the purple box (Headers) in Snapshot 5.
What I am doing now, is to inspect what elements I should put in to scrape data from this page. From Snapshot 5 above, you will be able to see Request URL, which is the url you need to put in the request part later. Due to encoding issue, “%2c” in the Request URL will be encoded to “,”. If you are interested in encoding, view this link for more information.
Now let’s prepare the required information for you to send a proper request to the server.
Part 1 Request Url
After changing all the “%2c” to “,”, the request url will turn out to be this link below.
Part 2 Headers
Request header is a component of a network packet sent by a browser or client to the server to request for a specific page or data on the Web server.
Referring to the purple box in Snapshot 6 , this is the header part which you should put in when you are scraping the website.
{"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36",
"Origin": "https://www2.sgx.com",
"Referer": "https://www2.sgx.com/securities/securities-prices"}
Now let’s put everything together as shown in the gist below.
Part 3— Clean Data
Till now you will have the response in JSON format. we will use Python pandas library to clean the data.
First, load in the stock code which you fill in earlier and clean it.
with open('selected.txt') as f:
selected_sc = f.readlines()
selected_sc = [x.replace('\n', '') for x in selected_sc]
portfolio = {x.split(',')[0]: float(x.split(',')[1]) for x in selected_sc}
Then, load the scraped data into JSON object, then change it to python pandas object.
data = json.loads(req.text)['data']
df = pd.DataFrame(data['prices'])
Next, rename the column name to be easier to understand.
df = df.rename(
columns={'b': 'Bid',
'lt': 'Last',
'bv': 'Bid_Volume',
'c': 'Change',
'sv': 'Ask_volume',
'h': 'High',
'l': 'Low',
'o': 'open',
'p': 'Change_percent',
's': 'Ask',
'vl': 'Volume',
'nc': 'Stock_code'})
Finally, filter the interested stock code which you want to invest or trade in and then calculate the price difference.
df = df[df['Stock_code'].isin(portfolio.keys())][['Stock_code', 'Last']]
df['bought_price'] = df['Stock_code'].map(portfolio) df['percentage_changes'] = (df['Last'] - df['bought_price'])*100
df['percentage_changes'] = df['percentage_changes'].apply( lambda x: '{0:.2f}%'.format(x))
Part 4 — Output the result in a csv or excel file
Save the data to csv file and 🎉WE ARE OFFICIALLY DONE! 🎉
df.to_csv('reseult.csv', index=False)
Below is the snapshot of the csv file:
The full code can be found in here.
Happy coding!
Final Thought
I am currently working as a Data Scientist, and what I can inform you is that crawling is still very important.
Thank you for reading this post. Feel free to leave comments below on topics which you may be interested to know. I will be publishing more posts in future about my experiences and projects.
This content is originally published here.
About Author
Low Wei Hong is a Data Scientist at Shopee. His experiences involved more on crawling websites, creating data pipeline and also implementing machine learning model on solving business problems.
He provides crawling services which is able to provide you the accurate and cleaned data which you need. You can visit this website to view his portfolio and also to contact him for crawling services.
Comments