Get Started with Job Scheduling in Python
In our daily life and work, certain tasks need to be repeated over a specific period. For example, you may need to back up your databases and files, check the availability of a service, or generate reports of certain activities. Since these tasks need to be repeated based on a schedule, it is better to automate them using a task scheduler. Many programming languages offer their task scheduling solution, and in this tutorial, we will discuss how to schedule tasks using Python.
Prerequisites
To get started with this tutorial, ensure that you have a computer with Linux and the latest version of Python installed. You can either set up a PC, a virtual machine, a virtual private server, or WSL (if you are using Windows). Also, make sure you log in as a user with root privileges, and you need to have some basic knowledge about Python and using command-line utilities on Linux systems.
Scheduling tasks with Cron Jobs
There are two main ways to schedule tasks using Python. The first method
involves using Python scripts to create jobs that are executed using the cron
command, while the second involves scheduling the task directly with Python. We
will explore both methods in this tutorial.
Start by creating a new working directory on your machine:
mkdir scheduledTasks && cd scheduledTasks
To start creating Cron Jobs with Python, you need to use a package called
python-crontab
. It allows you to
read, write, and access system cron jobs in a Python script using a simplified
syntax. You can install the package with the following command.
pip install python-crontab
Once installed, create a cron.py
file in your working directory. This is where
the code to schedule various tasks will be placed.
nano cron.py
Here is an example of how python-crontab
can be used used to create Cron Jobs:
from crontab import CronTab
cron = CronTab(user=True)
job = cron.new(command="echo 'hello world'")
job.minute.every(1)
cron.write()
First, the CronTab
class is imported and initializes a cron
object. Setting
the user
argument to True
ensures that the current user's crontab
file is
read and manipulated. You can also manipulate other users' crontab
file, but
you need the proper permissions to do so.
my_cron = CronTab(user=True) # My crontab
jack_cron = CronTab(user="jack") # Jack's crontab
A new Cron Job is created by calling the new()
method on the cron
object,
and its command
parameter specifies the shell command you wish to execute.
After creating the job, you need to specify its schedule. In this example, the
job is scheduled to run once every minute. Finally, you must save the job using
the write()
method to write it to the corresponding crontab
file.
Go ahead and execute the program using the following command:
python cron.py
You can check if the Cron Job has been created by running this command:
crontab -l
You should observe the following line at the bottom of the file:
. . .
* * * * * echo 'hello world'
Notice how the readable Python scheduling syntax gets translated to Cron's
cryptic syntax. This is one of the main advantages of using the python-crontab
package instead of editing the crontab
file yourself.
Setting time restrictions
Let's take a closer look at the scheduling options that the python-crontab
package exposes for automating tasks. Recall that a Cron expression has the
following syntax:
minute hour day_of_month month day_of_week command
The minute()
method that we used in the previous example corresponds to the
first field. Each of the other fields (except command
) has their corresponding
method as shown in the list below:
minute
:minute()
hour
:hour()
day_of_month
:day()
month
:month()
day_of_week
:dow()
The command
field corresponds to the command
parameter in the new()
method.
job = cron.new(command="echo 'hello world'")
Once you've specified the unit of time that should be used for scheduling (minute, hour, etc), you must define how often the job should be repeated. This could be a time interval, a frequency, or specific values. There are three different methods to help you with this.
on()
: defines specific values for the task to be repeated and it takes different values for different units. For instance, if the unit isminute
, integer values between 0-59 may be supplied as arguments. If the unit is day of week (dow
), integer values between 0-6 or string valuesSUN
-SAT
may be provided.
Below is a summary of how the on()
method works for various units, and the
corresponding crontab output:
job.minute.on(5) # 5th minute of every hour → 5 * * * *
job.hour.on(5) # 05:00 of every day → * 5 * * *
job.day.on(5) # 5th day of every month → * * 5 * *
job.month.on(5) # May of every year → * * * 5 *
job.month.on("MAY") # May of every year → * * * 5 *
job.dow.on(5) # Every Friday → * * * * 5
job.dow.on("FRI") # Every Friday → * * * * 5
You can also specify multiple values in the on()
method to form a list. This
corresponds to the comma character in a Cron expression.
job.day.on(5, 8, 10, 17) # corresponds to * * 5,8,10,17 * *
every()
: defines the frequency of repetition. Corresponds to the forward slash (/
) in a Cron expression.
job.minute.every(5) # Every 5 minutes → */5 * * * *
during()
: specifies a time interval, which corresponds to the dash (-
) character in a Cron expression. It takes two values to form an interval, and just like theon()
method, the allowable set of values varies according to the unit.
job.minute.during(5,50) # During minute 5 to 50 of every hour
job.dow.during('MON', 'FRI') # Monday to Friday
You can also combine during()
with every()
, which allows you to define a
range and then specify the frequency of repetition. For example:
job.minute.during(5,20).every(5) # Every 5 minutes from minute 5 to 20 → 5-20/5 * * * *
You need to remember that every time you set a schedule, the previous schedule (if any) will be cleared. For instance:
job.month.on(5) # Set to * * * 5 *
job.hour.every(2) # Override the previous schedule and set to * */2 * * *
However, if you need to combine multiple schedules for a simple task, you must
append use the also()
method as shown below:
job.month.on(5) # Set to * * * 5 *
job.hour.also.every(2) # merge with the previous schedule and set to * */2 * 5 *
If you are comfortable using Cron expressions,
there is also a setall()
method that allows you to use either Cron expressions
or Python datetime
objects
like this:
job.setall(None, "*/2", None, "5", None) # None means *
job.setall("* */2 * 5 *")
job.setall(datetime.time(10, 2)) # 2 10 * * *
job.setall(datetime.date(2000, 4, 2)) # * * 2 4 *
job.setall(datetime.datetime(2000, 4, 2, 10, 2)) # 2 10 2 4 *
Scheduling a Python script with python-crontab
In this section, you will create a Python scrapper that scrapes the Dev.to Community for the latest Python articles, sorts them according to their reactions, and saves them to a markdown file. Afterward, you will schedule this scrapper to run once every week using the concepts introduced in prior sections.
Create a scrapper.py
file with the following command:
nano scrapper.py
import re
import requests
import datetime
from bs4 import BeautifulSoup
# Retrieve the web page
URL = "https://dev.to/t/python"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
result = soup.find(id="substories")
# Get all articles
articles = result.find_all("div", class_="crayons-story")
article_result = []
# Get today's date and the date from a week ago
today = datetime.datetime.today()
a_week_ago = today - datetime.timedelta(days=7)
for article in articles:
# Get title and link
title_element = article.find("a", id=re.compile("article-link-\d+"))
title = title_element.text.strip()
link = title_element["href"]
# Get publish date
pub_date_element = article.find("time")
pub_date = pub_date_element.text
# Get number of reactions
reaction_element = article.find(string=re.compile("reaction"))
# If no reaction found, reaction is set to 0
if reaction_element != None:
reaction_element = reaction_element.findParent("a")
reaction = re.findall("\d+", reaction_element.text)
reaction = int(reaction[0])
else:
reaction = 0
# Get publish date in datetime type for comparison
pub = datetime.datetime.strptime(pub_date + str(today.year), "%b %d%Y")
# If an article has more than 5 reactions, and is published less than a week ago,
# the article is added to article_result
if reaction >= 5 and pub > a_week_ago:
article_result.append(
{"title": title, "link": link, "pub_date": pub_date, "reaction": reaction}
)
# Sort articles by number of reactions
article_result = sorted(article_result, key=lambda d: d["reaction"], reverse=True)
# Write the result to python-latest.md
f = open("python-latest.md", "w")
for i in article_result:
f.write("[" + i["title"] + "]")
f.write("(" + "https://dev.to" + i["link"] + ")")
f.write(
" | Published on " + i["pub_date"] + " | " + str(i["reaction"]) + " reactions"
)
f.write("\n\n")
f.close()
This scrapper first uses the requests
package to retrieve the desired webpage.
Next, the BeautifulSoup
package parses the resulting HTML and extracts the
title, link, number of reactions, and the publication date of each article.
Afterward, the scrapper filters out articles that have less than five reactions
or is published over a week ago, and finally, it writes all the remaining
articles into the python-latest.md
file.
Before you execute the program, install the required dependencies using the command below:
pip install beautifulsoup4 requests
Then run the program as follows:
python scrapper.py
A python-latest.md
file will be generated in the current directory. You can
view its contents to verify that the program works:
cat python-latest.md
[2022 Beginner Friendly Modern Data Engineering Career path With Learning Resources.](https://dev.to/grayhat/2022-beginner-friendly-modern-data-engineering-career-path-with-learning-resoures-26bi) | Published on Oct 10 | 13 reactions
[The Zen of Python, Explained](https://dev.to/aacitelli/the-zen-of-python-explained-1f3b) | Published on Oct 11 | 11 reactions
[Protect your Kubernetes Persistent Volumes from accidental deletion.](https://dev.to/tvelmachos/protect-your-kubernetes-persistent-volumes-from-accidental-deletion-3jgj) | Published on Oct 10 | 8 reactions
[Python-tweepy: automating a follow back task using windows scheduler](https://dev.to/wachuka_james/python-tweepy-automating-a-follow-back-task-using-windows-scheduler-4fm0) | Published on Oct 11 | 5 reactions
. . .
Now that the script is ready and proven to work, let's discuss how you can
schedule this script to run periodically, for example, every Saturday at 7:00
AM. Modify your cron.py
file as follows:
from crontab import CronTab
import os
cron = CronTab(user=True)
path = os.path.abspath("./scrapper.py")
job = cron.new(command="python " + "'" + path + "'")
job.minute.on(0)
job.hour.also.on(7)
job.dow.also.on("SAT")
cron.write()
Execute the cron.py
file, and confirm that a corresponding Cron Job was
created using the command below:
crontab -l
. . .
0 7 * * SAT python '<path_to>/scrapper.py'
You can verify the validity of this Cron expression using a handy website called Crontab Guru which allows you to check if your Cron expression is correct.
Managing existing Cron Jobs
Imagine this scenario: you've created several Cron Jobs on your machine, and you
need to update one of them. How can you locate that particular Cron Job and make
the update? While you can edit the crontab
file directly, the python-crontab
package also provides a few different ways to update existing jobs.
First, you can iterate over all the Cron Job lines like this:
cron = CronTab(user=True)
# Iterate over all Cron Jobs
for job in cron:
print(job)
# Iterate over all lines:
for line in cron.lines:
print(line)
This method is highly inefficient since you'll have to iterate over many Jobs or
lines just to find the one you're looking for. The second way to do this is by
using the find_time()
method:
job = cron.find_time("*/2 * * * *")
You need to pass the schedule of the Cron Job to the find_time()
method, and
Python will find that Cron Job for you. However, this method also has problems.
You are unlikely to remember the exact schedule when you have tons of Cron Jobs
on your machine.
The best way to locate a Cron Job is by utilizing comments. To do this, you need to associate each Cron Job with a comment like this:
from crontab import CronTab
cron = CronTab(user=True)
job = cron.new(command="echo 'hello world'")
job.minute.every(1)
job.set_comment("Output hello world")
cron.write()
Execute the cron.py
file again, and a new Cron Job with a comment will be
created.
crontab -l
* * * * * echo "hello world" # Output hello world
And then, you can locate this Cron Job using the find_comment()
method. If you
don't remember the exact comment, you can fuzzy match with
regular expression like this:
jobs = cron.find_comment("Output hello world")
jobs = cron.find_comment(re.compile(" hello \w")) # Fuzzy match the word "hello" using regular expression
One thing to note is that cron.find_comment()
will return a set of objects,
and you need to create a loop to access each of them.
jobs = cron.find_comment(. . .)
for job in jobs:
. . .
After you've located the Cron Job, you can modify its command or the comment:
job.set_command("curl http://www.google.com") # Modify command
job.set_comment("New comment here") # Modify comment
You can also clear a job of its schedule and set a new one:
job.clear()
job.setall(". . .")
It's also possible to disable the Cron Job (by commenting it out), or enable it like this:
job.enable(False) # Disable a Cron Job
job.enable() # Enable a Cron Job
You can also remove a Cron Job entirely:
cron.remove(job)
Support for batch deleting Cron Jobs is also provided with the remove_all()
method:
cron.remove_all() # Remove all Cron Jobs
cron.remove_all("echo") # Remove all Cron Jobs with the command "echo"
cron.remove_all(comment="foo") # Remove all Cron Jobs with the comment "foo"
cron.remove_all(time="*/2") # Remove all Cron Jobs with the schedule */2
Scheduling tasks without Cron Jobs
The second way to schedule tasks with Python is by utilizing a package called
schedule
. This package does not create Cron Jobs on your system, but it
requires you to create a Python script that runs continuously. The advantage of
using schedule
is that you can use it on any operating system, including
Windows, as long as Python is installed on your computer.
To schedule tasks using the schedule
package, create another file called
scheduler.py
:
nano scheduler.py
Place the following code into the file:
import schedule
import time
def task():
print("Job Executing!")
schedule.every().minute.do(task) # Run every minute
while True:
schedule.run_pending()
time.sleep(1)
Notice that there is an infinite loop at the end of this program. This means
that the script will not stop until you manually interrupt it (with Ctrl-C
).
The time.sleep(1)
line will also suspend the execution of the current thread
for one second, meaning this scheduler will run once every second. That's how
the schedule
package is able to determine how many seconds have passed and
when it is time to execute the scheduled tasks. In this example, the task()
is
scheduled to run every minute.
Use the following command to start this scheduler:
python scheduler.py
Wait for the task to execute and you'll get the output:
Job Executing!
Job Executing!
Job Executing!
. . .
To prevent the script from terminating when the terminal is closed, you can opt
to run it in the background. First, ensure you are not using print()
to
display your output because it will only run in the foreground. Instead, you can
save the program's output to a file like this:
import schedule
import time
def task():
f = open("task.txt", "a")
f.write("Job Executing!\n")
f.close()
schedule.every().minute.do(task) # Run every minute
while True:
schedule.run_pending()
time.sleep(1)
Next, execute this script with an ampersand (&
) at the end of the command:
python scheduler.py &
[1] <pid>
This will start scheduler.py
as a new process in the background, and you will
receive its process ID (PID) afterward. Then, wait for the scheduled task to
execute, and you should observe that a task.txt
file is now present in your
working directory.
cat task.txt
Job Executing!
Job Executing!
Job Executing!
Job Executing!
You can use the returned PID to kill the process like this:
sudo kill -9 <pid>
Setting schedules
Besides every minute, you have several options to choose from when it comes to how often the task should be executed.
schedule.every().second.do(task) # Run every second
schedule.every().minute.do(task) # Run every minute
schedule.every().hour.do(task) # Run every hour
schedule.every().day.do(task) # Run every day
schedule.every().week.do(task) # Run every week
You can also schedule the task to run every n
seconds/minutes/hours/days/weeks
like this (notice that the units need to be in plural form in this case):
n = 10
schedule.every(n).seconds.do(task)
schedule.every(n).minutes.do(task)
schedule.every(n).hours.do(task)
schedule.every(n).days.do(task)
schedule.every(n).weeks.do(task)
Sometimes, it is more convenient to schedule tasks based on the day of the week. For example:
schedule.every().monday.do(task) # Run every Monday
schedule.every().tuesday.do(task) # Run every Tuesday
schedule.every().wednesday.do(task) # Run every Wednesday
schedule.every().thursday.do(task) # Run every Thursday
schedule.every().friday.do(task) # Run every Friday
schedule.every().saturday.do(task) # Run every Saturday
schedule.every().sunday.do(task) # Run every Sunday
By default, these tasks will run at 00:00 on the scheduled day, but it is
possible to define a specific time by chaining an at()
method like this:
schedule.every().day.at("10:30:42").do(task) # Runs every day at HH:MM:SS
schedule.every().wednesday.at("13:15").do(job) # Runs every Wednesday at HH:MM
schedule.every().minute.at(":17").do(job) # Runs every minute at :SS
Lastly, you can set an end repeat time using the until()
method. The method
takes either a string or a DateTime
:
schedule.every(1).hours.until("18:30").do(task) # Run every hour until 18:30
schedule.every(1).hours.until("2030-01-01 18:33").do(task) # Run every hour until 2030-01-01, 18:33
schedule.every(1).hours.until(timedelta(hours=8)).do(task) # Run every hour for 8 hours
schedule.every(1).hours.until(time(11, 33, 42)).do(task) # Run every hour until 11:33:42
schedule.every(1).hours.until(datetime(2023, 5, 17, 11, 36, 20)).do(task) # Run every hour until 2023-5-17, 11:36:20
Now that we've covered the basics of schedule
, let's see how we can use it to
schedule our previous example, scrapper.py
.
import schedule
import time
import scrapper
schedule.every().saturday.at("07:00").do(scrapper)
while True:
schedule.run_pending()
time.sleep(1)
As you can see, schedule
has a much simpler API, as it only takes one line of
code to create the same schedule, allowing the scrapper to run automatically
every Saturday at 7:00.
Managing existing tasks
The schedule
package also offers a few different methods to manage previously
scheduled tasks. For example:
schedule.clear() # Cancel all tasks
schedule.run_all() # Execute all tasks once
Or you can cancel a specific job by assigning it to a variable and then use the
cancel_job()
method:
job = schedule.every(n).hours.do(task)
schedule.cancel_job(job)
If you need to create a job that only runs once, make the scheduled job return
CancelJob
. After task()
has been executed the first time, it will be
canceled automatically.
import schedule
import time
def task():
print("Job Executing!")
return schedule.CancelJob
schedule.every().minute.do(task)
while True:
schedule.run_pending()
time.sleep(1)
If you need to manage multiple jobs together, you can give them tags using the
tag()
method. Notice that you can assign multiple tags to one job.
schedule.every().day.do(task).tag("daily", "work")
schedule.every().hour.do(task).tag("hourly", "work")
schedule.every().hour.do(task).tag("hourly", "personal")
schedule.every().day.do(task).tag("daily", "other")
Then you can manage them using the get_jobs()
method, or cancel them using the
clear()
method:
daily = schedule.get_jobs("daily") # Get all daily tasks
schedule.clear("work") # Cancel all work tasks
Monitoring Python scheduled tasks with Better Uptime
Regardless of the method you choose to schedule your recurring tasks, you need to set up a monitoring system that will notify you when a job fails to run as scheduled. There are several ways to do this, but one of the most accessible options involves using a cloud monitoring tool, and that is what we'll explore in this section.
Better Uptime is a server monitoring tool that offers scheduled task monitoring services. It allows you to set up an alert for your scheduled jobs, and when the job is down for some reason, Better Uptime will notify you and your team through emails, SMS texts, or phone calls based on your selection. This section will discuss using Better Uptime to monitor your scheduled Python tasks.
First, you must create a free Better Uptime account if you don't have one already. Once signed in, go to Heartbeats, which is Better Uptime's Cron monitoring service, and create a new heartbeat.
Then choose an appropriate name for your monitor and select how often you expect this scheduled job to be repeated. Next, in the On-call escalation section, pick how you wish to be notified when the scheduled task fails to execute. After you are done, click Create heartbeat.
Next, you should see this page:
Notice the highlighted section in the middle. This URL is what Better Uptime uses to monitor your scheduled jobs. Every time a scheduled task executes, you should ensure that a HEAD, GET, or POST request is made to this URL.
For example, if you are using python-crontab
, head back to the cron.py
file
and edit the command part of the Cron Job (ensure curl
is installed on your
machine):
. . .
path = os.path.abspath("./scrapper.py")
job = cron.new(command="python " + "'" + path + "'" + " && curl https://betteruptime.com/api/v1/heartbeat/<api_key>")
job.set_comment("Dev.to Scrapper")
. . .
Now, execute the cron.py
file and go to the Better Uptime monitor you just
created. Wait for a minute, and once Better Uptime starts receiving requests,
the monitor will be marked as "Up", which means the Cron Job is up and running.
If you disable this Cron Job, which simulates an incident:
. . .
jobs = cron.find_comment("Dev.to Scrapper")
for job in jobs:
job.enable(False)
Execute the script again and wait for a few minutes. If Better Uptime does not receive a request within the time frame you just configured, the monitor will be marked as "Down", which means an incident occurred.
You will also receive an alert in the configured channels:
Since this particular Cron Job will execute a Python script, you can also make a request to Better Uptime in the Python script instead:
. . .
requests.get(
"https://betteruptime.com/api/v1/heartbeat/<api_key>"
)
This way, when you are scheduling multiple scripts in one Cron Job, you will be able to monitor each script separately. So that you'll know exactly which script is not working when something goes wrong.
If you are using the schedule
package, you need to make a request to
betteruptime.com
within the task()
method:
. . .
def task():
# print("Job Executing!")
f = open("task.md", "a")
f.write("Job Executing!")
f.close()
requests.get("https://betteruptime.com/api/v1/heartbeat/<api_key>")
Or you can make a request in the scrapper.py
file, as discussed before.
Final thoughts
In this tutorial, we discussed scheduling Python scripts using python-crontab
and schedule
. The python-crontab
package utilizes Cron under the hood, which
is only available on Unix-like systems. On the other hand, schedule
has a much
simpler API, and it will work on any operating system with Python installed. It
does have a few
limitations
of its own, making it less powerful overall than python-crontab
.
To dig deeper into task automation using Cron, we recommend learning more about job scheduling on Linux systems. If you are unsure about the best database match check our PostgreSQL guide. Thanks for reading, and happy scheduling!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github