I got the idea for this article while experimenting with python automation. I decided to create a simple tutorial on automating tasks periodically using python and cronjobs.
Keep in mind that this demonstration is by no means exhaustive. However, the core concepts conveyed will be applicable in other use cases.
First, here’s an overview of the python script that will be running periodically on my machine.
I will explain each section of this script in more depth. If you understand this script sufficiently, feel free to skip ahead to the scheduling section of this article.
import os
import json
import yagmail
import datetime
import yfinance as yf
from dotenv import load_dotenv
load_dotenv()
FILE_PATH = os.getenv('FILE_PATH')
DEV_EMAIL = os.getenv('DEV_EMAIL')
DEV_PASSWORD = os.getenv('DEV_PASSWORD')
NOTIFICATION_EMAIL = os.getenv('NOTIFICATION_EMAIL')
yesterday = datetime.date.today() - datetime.timedelta(days=1)
data = yf.download(
tickers="AAPL MSFT",
group_by="ticker",
start=yesterday,
end=yesterday
)[:1]
records = {}
try:
with open(FILE_PATH, 'r') as file:
records = json.load(file)
except (FileNotFoundError, json.JSONDecodeError) as e:
pass
parsed = {}
for ticker, column in data:
parsed[ticker] = {"Date": str(yesterday)} if ticker not in parsed else parsed[ticker]
parsed[ticker][column] = int(data[ticker][column]) if ticker == 'Volume' else float(data[ticker][column])
for ticker in parsed:
if ticker in records:
records[ticker].append(parsed[ticker])
else:
records[ticker] = [parsed[ticker]]
with open(FILE_PATH, 'w') as file:
json.dump(records, file, indent=2)
yagmail.SMTP(
DEV_EMAIL,
DEV_PASSWORD
).send(
to=NOTIFICATION\_EMAIL,
subject='Daily Financial Data',
contents='Your daily financial record update is complete!'
)
Script breakdown
Let’s talk about the standout packages that are employed in this script:
- yagmail is a Gmail/SMTP client package that I use to send emails. Not a vital feature but I will explain my reasons for using it later in the article. There are other ways to send emails using python, this is my preferred way of doing it when I simply need to send an email with code that can fit in one line
- dotenv (python-dotenv) allows us to load environment variables from a .env file in our project. This means we don’t have to fiddle with our system’s environment variables directly.
- yfinance is the library of choice in accessing Yahoo financial data.
load_dotenv()
# Path to JSON file used for storage
FILE_PATH = os.getenv('FILE_PATH')
DEV_EMAIL = os.getenv('DEV_EMAIL')
DEV_PASSWORD = os.getenv('DEV_PASSWORD')
# Email to notify when the task is completed
NOTIFICATION_EMAIL = os.getenv('NOTIFICATION_EMAIL')
Our first action is loading the environment variables from the .env file using the load_dotenv() function. This function looks for a .env file in the current and parent directories.
Once the file is found, the variables are loaded and can be accessed like any system environment variable using the os package.
yesterday = datetime.date.today() - datetime.timedelta(days=1)
data = yf.download(
tickers="AAPL MSFT",
group_by="ticker",
start=yesterday,
end=yesterday
)[:1]
Now for the exciting part. Here, I’m using yfinance to download Microsoft and Apple data from the previous day. You can read more about this package here.
The object returned by the download method is a pandas DataFrame. I make sure to truncate it to only retrieve the first row of the results. This is only a safety measure, as Only one row is expected anyway.
records = {}
try:
with open(FILE_PATH, 'r') as file:
records = json.load(file)
except (FileNotFoundError, json.JSONDecodeError) as e:
pass
This section of the code loads the records that we already have. The records are currently stored in a JSON file specified by FILE_PATH
.
If you’re downloading a lot of data, I would advise you to use a database such as MongoDB or MySQL.
parsed = {}
for ticker, column in data:
parsed[ticker] = {"Date": str(yesterday)} if ticker not in parsed else parsed[ticker]
parsed[ticker][column] = int(data[ticker][column]) if ticker == 'Volume' else float(data[ticker][column])
for ticker in parsed:
if ticker in records:
records[ticker].append(parsed[ticker])
else:
records[ticker] = [parsed[ticker]]
with open(FILE_PATH, 'w') as file:
json.dump(records, file, indent=2)
In this section, I parse the data manually into a Python dictionary. I chose to do this because I’m more comfortable with basic Python dictionaries than pandas DataFrames.
If you’re more comfortable dealing with DataFrames, it’s possible to export the data to a JSON file directly.
After parsing the data, we append it to the records dictionary and then overwrite the JSON file with the new records data.
yagmail.SMTP(
DEV_EMAIL,
DEV_PASSWORD
).send(
to=NOTIFICATION_EMAIL,
subject='Daily Financial Data',
contents='Your daily financial record update is complete!'
)
Finally, we use yagmail to send an email notification at the end of the script to notify us that the script has been executed.
## Scheduling execution
### Using loops
The next challenge is scheduling the execution of this script. One option would be to wrap everything in an endless while loop and sleep at the end of the loop. This would look like this:
```python
import time
....
while True:
# Execute script here
time.sleep(60 * 60 * 24) # Seconds to sleep
The issue with this approach is that we have to make sure the script is always running. This means we have to run it manually and leave it running in the background. This defeats the purpose of automation as the script will not run if we ever forget to execute it.
Another problem with this script is that it’s not optimal. We’re sleeping for 24 hours! Unless you have a very specific scenario where this is necessary, if you ever find yourself sleeping for 24 hours, it’s time to consider cronjobs (or see a doctor!).
Using cronjobs
Cronjobs are the more effective solution for scheduling the execution of this script. There are 2 ways to achieve this:
- Create a bash script that executes the python script
- Execute the python script directly (recommended)
Before implementing either of these approaches. We first have to find out the location of the python interpreter on our system. On macOS or Linux, you can find this out using the “which” command followed by the python command.
$ which python3
/Library/Frameworks/Python.framework/Versions/3.7/bin/python3
In this case, I’m specifically looking for the python3 interpreter as I want to execute this script with python 3. On my machine, the “python” command maps to python 2.7.
Execute through a bash script
#!/bin/bash
/Library/Frameworks/Python.framework/Versions/3.7/bin/python3 ${HOME}/<path-to-python script>
The code above is contained in a bash script that will execute the python script. The bash script basically contains the command we would input on the terminal but with some key differences.
On the terminal we would run this script from the project root as follows:
$ python3 script.py
On the bash script above we do something similar except instead of the python3 command, we specify the full path of the python3 interpreter. The second part of the command is the full path of the script we want to run.
Keep in mind that the paths to the interpreter and the script are relative from the system root and NOT the project root or the user’s home.
Make bash script executable with the following command:
chmod +x <path-to-bash-script>
Open crontab in edit mode and add a command to execute the bash script we created. I’ve used the @daily schedule here which executes the given command everyday at midnight.
If you’d like to fine-tune the exact hour of the day that you’d like to execute this command, you can read this article to learn how to achieve that.
crontab -e
@daily .<path-to-bash-script>
Execute python script directly
The second alternative is directly executing the python script from cron. In order to do this, we have to go through the same steps to prepare the python script as we did with the bash script.
We have to make the python script executable first. Navigate to the script’s directory in the terminal and then run the following command:
chmod +x script.py
Once you’ve done that, open the script itself and add a similar line to this at the very top of the file:
_#!/Library/Frameworks/Python.framework/Versions/3.7/bin/python3_
This is called a hashbang or shebang. It is made up of 2 parts: “#!” followed by the location of the interpreter you want to use. This is the result you get when you run the “which python3” command.
A hashbang tells the system how to execute the file. You will notice we have a hashbang at the beginning of the bash script as well. This points to the bash interpreter.
The benefit of using a hashbang is that we don’t have to preface the script name with the interpreter command when executing it in the terminal.
So instead of:
$ python3 script.py
We can do:
$ ./script.py
Once we’ve done that, we can edit the crontab entry to execute the file directly using the following code:
@daily /Library/Frameworks/Python.framework/Versions/3.7/bin/python3 ${HOME}/<path-to-python script>
Before you save this, remember that we have the hashbang at the top of the script, so we don’t have to specify the interpreter in the crontab line. Let’s update this entry to the following:
@daily .${HOME}/<path-to-python script>
Hashbangs help us keep our crontab entries much cleaner.
Now the script will run at the scheduled times and specified intervals.