[go: up one dir, main page]

Back to Scaling Python Applications guides

Get Started with Job Scheduling in Python

Eric Hu
Updated on January 29, 2024

In our daily life and work, certain tasks need to be repeated over a specific period. For example, you may need to back up your databases and files, check the availability of a service, or generate reports of certain activities. Since these tasks need to be repeated based on a schedule, it is better to automate them using a task scheduler. Many programming languages offer their task scheduling solution, and in this tutorial, we will discuss how to schedule tasks using Python.

Prerequisites

To get started with this tutorial, ensure that you have a computer with Linux and the latest version of Python installed. You can either set up a PC, a virtual machine, a virtual private server, or WSL (if you are using Windows). Also, make sure you log in as a user with root privileges, and you need to have some basic knowledge about Python and using command-line utilities on Linux systems.

Scheduling tasks with Cron Jobs

There are two main ways to schedule tasks using Python. The first method involves using Python scripts to create jobs that are executed using the cron command, while the second involves scheduling the task directly with Python. We will explore both methods in this tutorial.

Start by creating a new working directory on your machine:

 
mkdir scheduledTasks && cd scheduledTasks

To start creating Cron Jobs with Python, you need to use a package called python-crontab. It allows you to read, write, and access system cron jobs in a Python script using a simplified syntax. You can install the package with the following command.

 
pip install python-crontab

Once installed, create a cron.py file in your working directory. This is where the code to schedule various tasks will be placed.

 
nano cron.py

Here is an example of how python-crontab can be used used to create Cron Jobs:

cron.py
from crontab import CronTab

cron = CronTab(user=True)

job = cron.new(command="echo 'hello world'")
job.minute.every(1)

cron.write()

First, the CronTab class is imported and initializes a cron object. Setting the user argument to True ensures that the current user's crontab file is read and manipulated. You can also manipulate other users' crontab file, but you need the proper permissions to do so.

 
my_cron = CronTab(user=True) # My crontab
jack_cron = CronTab(user="jack") # Jack's crontab

A new Cron Job is created by calling the new() method on the cron object, and its command parameter specifies the shell command you wish to execute. After creating the job, you need to specify its schedule. In this example, the job is scheduled to run once every minute. Finally, you must save the job using the write() method to write it to the corresponding crontab file.

Go ahead and execute the program using the following command:

 
python cron.py

You can check if the Cron Job has been created by running this command:

 
crontab -l

You should observe the following line at the bottom of the file:

Output
. . .
* * * * * echo 'hello world'

Notice how the readable Python scheduling syntax gets translated to Cron's cryptic syntax. This is one of the main advantages of using the python-crontab package instead of editing the crontab file yourself.

Setting time restrictions

Let's take a closer look at the scheduling options that the python-crontab package exposes for automating tasks. Recall that a Cron expression has the following syntax:

 
minute hour day_of_month month day_of_week command

The minute() method that we used in the previous example corresponds to the first field. Each of the other fields (except command) has their corresponding method as shown in the list below:

  • minute: minute()
  • hour: hour()
  • day_of_month: day()
  • month: month()
  • day_of_week: dow()

The command field corresponds to the command parameter in the new() method.

 
job = cron.new(command="echo 'hello world'")

Once you've specified the unit of time that should be used for scheduling (minute, hour, etc), you must define how often the job should be repeated. This could be a time interval, a frequency, or specific values. There are three different methods to help you with this.

  • on(): defines specific values for the task to be repeated and it takes different values for different units. For instance, if the unit is minute, integer values between 0-59 may be supplied as arguments. If the unit is day of week (dow), integer values between 0-6 or string values SUN-SAT may be provided.

Below is a summary of how the on() method works for various units, and the corresponding crontab output:

 
  job.minute.on(5) # 5th minute of every hour → 5 * * * *

  job.hour.on(5) # 05:00 of every day → * 5 * * *

  job.day.on(5) # 5th day of every month → * * 5 * *

  job.month.on(5) # May of every year → * * * 5 *
  job.month.on("MAY") # May of every year → * * * 5 *

  job.dow.on(5) # Every Friday → * * * * 5
  job.dow.on("FRI") # Every Friday → * * * * 5

You can also specify multiple values in the on() method to form a list. This corresponds to the comma character in a Cron expression.

 
  job.day.on(5, 8, 10, 17) # corresponds to * * 5,8,10,17 * *
  • every(): defines the frequency of repetition. Corresponds to the forward slash (/) in a Cron expression.
 
  job.minute.every(5) # Every 5 minutes → */5 * * * *
  • during(): specifies a time interval, which corresponds to the dash (-) character in a Cron expression. It takes two values to form an interval, and just like the on() method, the allowable set of values varies according to the unit.
 
  job.minute.during(5,50) # During minute 5 to 50 of every hour
  job.dow.during('MON', 'FRI') # Monday to Friday

You can also combine during() with every(), which allows you to define a range and then specify the frequency of repetition. For example:

 
  job.minute.during(5,20).every(5) # Every 5 minutes from minute 5 to 20 → 5-20/5 * * * *

You need to remember that every time you set a schedule, the previous schedule (if any) will be cleared. For instance:

 
job.month.on(5) # Set to * * * 5 *
job.hour.every(2) # Override the previous schedule and set to * */2 * * *

However, if you need to combine multiple schedules for a simple task, you must append use the also() method as shown below:

 
job.month.on(5) # Set to * * * 5 *
job.hour.also.every(2) # merge with the previous schedule and set to * */2 * 5 *

If you are comfortable using Cron expressions, there is also a setall() method that allows you to use either Cron expressions or Python datetime objects like this:

 
job.setall(None, "*/2", None, "5", None) # None means *
job.setall("* */2 * 5 *")

job.setall(datetime.time(10, 2)) # 2 10 * * *
job.setall(datetime.date(2000, 4, 2)) # * * 2 4 *
job.setall(datetime.datetime(2000, 4, 2, 10, 2)) # 2 10 2 4 *

Scheduling a Python script with python-crontab

In this section, you will create a Python scrapper that scrapes the Dev.to Community for the latest Python articles, sorts them according to their reactions, and saves them to a markdown file. Afterward, you will schedule this scrapper to run once every week using the concepts introduced in prior sections.

Create a scrapper.py file with the following command:

 
nano scrapper.py
scrapper.py
import re
import requests
import datetime
from bs4 import BeautifulSoup

# Retrieve the web page
URL = "https://dev.to/t/python"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
result = soup.find(id="substories")

# Get all articles
articles = result.find_all("div", class_="crayons-story")

article_result = []
# Get today's date and the date from a week ago
today = datetime.datetime.today()
a_week_ago = today - datetime.timedelta(days=7)

for article in articles:

    # Get title and link
    title_element = article.find("a", id=re.compile("article-link-\d+"))
    title = title_element.text.strip()
    link = title_element["href"]

    # Get publish date
    pub_date_element = article.find("time")
    pub_date = pub_date_element.text

    # Get number of reactions
    reaction_element = article.find(string=re.compile("reaction"))

    # If no reaction found, reaction is set to 0
    if reaction_element != None:
        reaction_element = reaction_element.findParent("a")
        reaction = re.findall("\d+", reaction_element.text)
        reaction = int(reaction[0])
    else:
        reaction = 0

    # Get publish date in datetime type for comparison
    pub = datetime.datetime.strptime(pub_date + str(today.year), "%b %d%Y")

    # If an article has more than 5 reactions, and is published less than a week ago,
    # the article is added to article_result
    if reaction >= 5 and pub > a_week_ago:
        article_result.append(
            {"title": title, "link": link, "pub_date": pub_date, "reaction": reaction}
        )

# Sort articles by number of reactions
article_result = sorted(article_result, key=lambda d: d["reaction"], reverse=True)

# Write the result to python-latest.md
f = open("python-latest.md", "w")

for i in article_result:

    f.write("[" + i["title"] + "]")
    f.write("(" + "https://dev.to" + i["link"] + ")")
    f.write(
        " | Published on " + i["pub_date"] + " | " + str(i["reaction"]) + " reactions"
    )
    f.write("\n\n")

f.close()

This scrapper first uses the requests package to retrieve the desired webpage. Next, the BeautifulSoup package parses the resulting HTML and extracts the title, link, number of reactions, and the publication date of each article. Afterward, the scrapper filters out articles that have less than five reactions or is published over a week ago, and finally, it writes all the remaining articles into the python-latest.md file.

Before you execute the program, install the required dependencies using the command below:

 
pip install beautifulsoup4 requests

Then run the program as follows:

 
python scrapper.py

A python-latest.md file will be generated in the current directory. You can view its contents to verify that the program works:

 
cat python-latest.md
Output
[2022 Beginner Friendly Modern Data Engineering Career path With Learning Resources.](https://dev.to/grayhat/2022-beginner-friendly-modern-data-engineering-career-path-with-learning-resoures-26bi) | Published on Oct 10 | 13 reactions

[The Zen of Python, Explained](https://dev.to/aacitelli/the-zen-of-python-explained-1f3b) | Published on Oct 11 | 11 reactions

[Protect your Kubernetes Persistent Volumes from accidental deletion.](https://dev.to/tvelmachos/protect-your-kubernetes-persistent-volumes-from-accidental-deletion-3jgj) | Published on Oct 10 | 8 reactions

[Python-tweepy: automating a follow back task using windows scheduler](https://dev.to/wachuka_james/python-tweepy-automating-a-follow-back-task-using-windows-scheduler-4fm0) | Published on Oct 11 | 5 reactions

. . .

Now that the script is ready and proven to work, let's discuss how you can schedule this script to run periodically, for example, every Saturday at 7:00 AM. Modify your cron.py file as follows:

cron.py
from crontab import CronTab
import os

cron = CronTab(user=True)

path = os.path.abspath("./scrapper.py")
job = cron.new(command="python " + "'" + path + "'")
job.minute.on(0)
job.hour.also.on(7)
job.dow.also.on("SAT")

cron.write()

Execute the cron.py file, and confirm that a corresponding Cron Job was created using the command below:

 
crontab -l
Output
. . .
0 7 * * SAT python '<path_to>/scrapper.py'

You can verify the validity of this Cron expression using a handy website called Crontab Guru which allows you to check if your Cron expression is correct.

Crontab Guru

Managing existing Cron Jobs

Imagine this scenario: you've created several Cron Jobs on your machine, and you need to update one of them. How can you locate that particular Cron Job and make the update? While you can edit the crontab file directly, the python-crontab package also provides a few different ways to update existing jobs.

First, you can iterate over all the Cron Job lines like this:

 
cron = CronTab(user=True)

# Iterate over all Cron Jobs
for job in cron:
    print(job)

# Iterate over all lines:
for line in cron.lines:
    print(line)

This method is highly inefficient since you'll have to iterate over many Jobs or lines just to find the one you're looking for. The second way to do this is by using the find_time() method:

 
job = cron.find_time("*/2 * * * *")

You need to pass the schedule of the Cron Job to the find_time() method, and Python will find that Cron Job for you. However, this method also has problems. You are unlikely to remember the exact schedule when you have tons of Cron Jobs on your machine.

The best way to locate a Cron Job is by utilizing comments. To do this, you need to associate each Cron Job with a comment like this:

cron.py
from crontab import CronTab

cron = CronTab(user=True)

job = cron.new(command="echo 'hello world'")
job.minute.every(1)
job.set_comment("Output hello world")
cron.write()

Execute the cron.py file again, and a new Cron Job with a comment will be created.

 
crontab -l
Output
* * * * * echo "hello world" # Output hello world

And then, you can locate this Cron Job using the find_comment() method. If you don't remember the exact comment, you can fuzzy match with regular expression like this:

 
jobs = cron.find_comment("Output hello world")
jobs = cron.find_comment(re.compile(" hello \w")) # Fuzzy match the word "hello" using regular expression

One thing to note is that cron.find_comment() will return a set of objects, and you need to create a loop to access each of them.

 
jobs = cron.find_comment(. . .)

for job in jobs:
    . . .

After you've located the Cron Job, you can modify its command or the comment:

 
job.set_command("curl http://www.google.com") # Modify command
job.set_comment("New comment here") # Modify comment

You can also clear a job of its schedule and set a new one:

 
job.clear()
job.setall(". . .")

It's also possible to disable the Cron Job (by commenting it out), or enable it like this:

 
job.enable(False) # Disable a Cron Job
job.enable() # Enable a Cron Job

You can also remove a Cron Job entirely:

 
cron.remove(job)

Support for batch deleting Cron Jobs is also provided with the remove_all() method:

 
cron.remove_all() # Remove all Cron Jobs
cron.remove_all("echo") # Remove all Cron Jobs with the command "echo"
cron.remove_all(comment="foo") # Remove all Cron Jobs with the comment "foo"
cron.remove_all(time="*/2") # Remove all Cron Jobs with the schedule */2

Scheduling tasks without Cron Jobs

The second way to schedule tasks with Python is by utilizing a package called schedule. This package does not create Cron Jobs on your system, but it requires you to create a Python script that runs continuously. The advantage of using schedule is that you can use it on any operating system, including Windows, as long as Python is installed on your computer.

To schedule tasks using the schedule package, create another file called scheduler.py:

 
nano scheduler.py

Place the following code into the file:

scheduler.py
import schedule
import time

def task():
    print("Job Executing!")

schedule.every().minute.do(task) # Run every minute

while True:
    schedule.run_pending()
    time.sleep(1)

Notice that there is an infinite loop at the end of this program. This means that the script will not stop until you manually interrupt it (with Ctrl-C). The time.sleep(1) line will also suspend the execution of the current thread for one second, meaning this scheduler will run once every second. That's how the schedule package is able to determine how many seconds have passed and when it is time to execute the scheduled tasks. In this example, the task() is scheduled to run every minute.

Use the following command to start this scheduler:

 
python scheduler.py

Wait for the task to execute and you'll get the output:

Output
Job Executing!
Job Executing!
Job Executing!
. . .

To prevent the script from terminating when the terminal is closed, you can opt to run it in the background. First, ensure you are not using print() to display your output because it will only run in the foreground. Instead, you can save the program's output to a file like this:

scheduler.py
import schedule
import time

def task():
f = open("task.txt", "a")
f.write("Job Executing!\n")
f.close()
schedule.every().minute.do(task) # Run every minute while True: schedule.run_pending() time.sleep(1)

Next, execute this script with an ampersand (&) at the end of the command:

 
python scheduler.py &
Output
[1] <pid>

This will start scheduler.py as a new process in the background, and you will receive its process ID (PID) afterward. Then, wait for the scheduled task to execute, and you should observe that a task.txt file is now present in your working directory.

 
cat task.txt
Output
Job Executing!
Job Executing!
Job Executing!
Job Executing!

You can use the returned PID to kill the process like this:

 
sudo kill -9 <pid>

Setting schedules

Besides every minute, you have several options to choose from when it comes to how often the task should be executed.

 
schedule.every().second.do(task) # Run every second
schedule.every().minute.do(task) # Run every minute
schedule.every().hour.do(task) # Run every hour
schedule.every().day.do(task) # Run every day
schedule.every().week.do(task) # Run every week

You can also schedule the task to run every n seconds/minutes/hours/days/weeks like this (notice that the units need to be in plural form in this case):

 
n = 10

schedule.every(n).seconds.do(task)
schedule.every(n).minutes.do(task)
schedule.every(n).hours.do(task)
schedule.every(n).days.do(task)
schedule.every(n).weeks.do(task)

Sometimes, it is more convenient to schedule tasks based on the day of the week. For example:

 
schedule.every().monday.do(task) # Run every Monday
schedule.every().tuesday.do(task) # Run every Tuesday
schedule.every().wednesday.do(task) # Run every Wednesday
schedule.every().thursday.do(task) # Run every Thursday
schedule.every().friday.do(task) # Run every Friday
schedule.every().saturday.do(task) # Run every Saturday
schedule.every().sunday.do(task) # Run every Sunday

By default, these tasks will run at 00:00 on the scheduled day, but it is possible to define a specific time by chaining an at() method like this:

 
schedule.every().day.at("10:30:42").do(task) # Runs every day at HH:MM:SS
schedule.every().wednesday.at("13:15").do(job) # Runs every Wednesday at HH:MM
schedule.every().minute.at(":17").do(job) # Runs every minute at :SS

Lastly, you can set an end repeat time using the until() method. The method takes either a string or a DateTime:

 
schedule.every(1).hours.until("18:30").do(task) # Run every hour until 18:30
schedule.every(1).hours.until("2030-01-01 18:33").do(task) # Run every hour until 2030-01-01, 18:33
schedule.every(1).hours.until(timedelta(hours=8)).do(task) # Run every hour for 8 hours
schedule.every(1).hours.until(time(11, 33, 42)).do(task) # Run every hour until 11:33:42
schedule.every(1).hours.until(datetime(2023, 5, 17, 11, 36, 20)).do(task) # Run every hour until 2023-5-17, 11:36:20

Now that we've covered the basics of schedule, let's see how we can use it to schedule our previous example, scrapper.py.

scheduler.py
import schedule
import time
import scrapper
schedule.every().saturday.at("07:00").do(scrapper)
while True: schedule.run_pending() time.sleep(1)

As you can see, schedule has a much simpler API, as it only takes one line of code to create the same schedule, allowing the scrapper to run automatically every Saturday at 7:00.

Managing existing tasks

The schedule package also offers a few different methods to manage previously scheduled tasks. For example:

 
schedule.clear() # Cancel all tasks
schedule.run_all() # Execute all tasks once

Or you can cancel a specific job by assigning it to a variable and then use the cancel_job() method:

 
job = schedule.every(n).hours.do(task)
schedule.cancel_job(job)

If you need to create a job that only runs once, make the scheduled job return CancelJob. After task() has been executed the first time, it will be canceled automatically.

 
import schedule
import time

def task():
    print("Job Executing!")
return schedule.CancelJob
schedule.every().minute.do(task) while True: schedule.run_pending() time.sleep(1)

If you need to manage multiple jobs together, you can give them tags using the tag() method. Notice that you can assign multiple tags to one job.

 
schedule.every().day.do(task).tag("daily", "work")
schedule.every().hour.do(task).tag("hourly", "work")
schedule.every().hour.do(task).tag("hourly", "personal")
schedule.every().day.do(task).tag("daily", "other")

Then you can manage them using the get_jobs() method, or cancel them using the clear() method:

 
daily = schedule.get_jobs("daily") # Get all daily tasks

schedule.clear("work") # Cancel all work tasks

Monitoring Python scheduled tasks with Better Uptime

Regardless of the method you choose to schedule your recurring tasks, you need to set up a monitoring system that will notify you when a job fails to run as scheduled. There are several ways to do this, but one of the most accessible options involves using a cloud monitoring tool, and that is what we'll explore in this section.

Better Uptime is a server monitoring tool that offers scheduled task monitoring services. It allows you to set up an alert for your scheduled jobs, and when the job is down for some reason, Better Uptime will notify you and your team through emails, SMS texts, or phone calls based on your selection. This section will discuss using Better Uptime to monitor your scheduled Python tasks.

First, you must create a free Better Uptime account if you don't have one already. Once signed in, go to Heartbeats, which is Better Uptime's Cron monitoring service, and create a new heartbeat.

Better Uptime Heartbeats

Then choose an appropriate name for your monitor and select how often you expect this scheduled job to be repeated. Next, in the On-call escalation section, pick how you wish to be notified when the scheduled task fails to execute. After you are done, click Create heartbeat.

Better Uptime create heartbeat

Next, you should see this page:

Better Uptime heartbeat URL

Notice the highlighted section in the middle. This URL is what Better Uptime uses to monitor your scheduled jobs. Every time a scheduled task executes, you should ensure that a HEAD, GET, or POST request is made to this URL.

For example, if you are using python-crontab, head back to the cron.py file and edit the command part of the Cron Job (ensure curl is installed on your machine):

cron.py
. . .
path = os.path.abspath("./scrapper.py")
job = cron.new(command="python " + "'" + path + "'" + " && curl https://betteruptime.com/api/v1/heartbeat/<api_key>")
job.set_comment("Dev.to Scrapper")
. . .

Now, execute the cron.py file and go to the Better Uptime monitor you just created. Wait for a minute, and once Better Uptime starts receiving requests, the monitor will be marked as "Up", which means the Cron Job is up and running.

Better Uptime heartbeat Running

If you disable this Cron Job, which simulates an incident:

cron.py
. . .
jobs = cron.find_comment("Dev.to Scrapper")


for job in jobs:
    job.enable(False)

Execute the script again and wait for a few minutes. If Better Uptime does not receive a request within the time frame you just configured, the monitor will be marked as "Down", which means an incident occurred.

Better Uptime heartbeat Incident

You will also receive an alert in the configured channels:

Better Uptime heartbeat Alert

Since this particular Cron Job will execute a Python script, you can also make a request to Better Uptime in the Python script instead:

scrapper.py
. . .
requests.get(
    "https://betteruptime.com/api/v1/heartbeat/<api_key>"
)

This way, when you are scheduling multiple scripts in one Cron Job, you will be able to monitor each script separately. So that you'll know exactly which script is not working when something goes wrong.

If you are using the schedule package, you need to make a request to betteruptime.com within the task() method:

scheduler.py
. . .
def task():
    # print("Job Executing!")
    f = open("task.md", "a")
    f.write("Job Executing!")
    f.close()

requests.get("https://betteruptime.com/api/v1/heartbeat/<api_key>")

Or you can make a request in the scrapper.py file, as discussed before.

Final thoughts

In this tutorial, we discussed scheduling Python scripts using python-crontab and schedule. The python-crontab package utilizes Cron under the hood, which is only available on Unix-like systems. On the other hand, schedule has a much simpler API, and it will work on any operating system with Python installed. It does have a few limitations of its own, making it less powerful overall than python-crontab.

To dig deeper into task automation using Cron, we recommend learning more about job scheduling on Linux systems. If you are unsure about the best database match check our PostgreSQL guide. Thanks for reading, and happy scheduling!

Author's avatar
Article by
Eric Hu
Eric is a technical writer with a passion for writing and coding, mainly in Python and PHP. He loves transforming complex technical concepts into accessible content, solidifying understanding while sharing his own perspective. He wishes his content can be of assistance to as many people as possible.
Got an article suggestion? Let us know
Next article
Deploying Django Apps with Docker: A Step-by-Step Guide
This article provides step-by-step instructions for deploying your Django applications using Docker and Docker Compose
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github