Using ChatGPT to complete Python crawler code without Python basics.
- July 4, 2023
1. Background:
I am an experienced CSGO player, and buying and selling CSGO game props is my daily routine, so I created an Excel sheet to keep track of my game prop list and profit.
One day on the way to school, I mentioned to my partner that it would be nice to have a crawler program to monitor the price changes of game props for me. I would also never need to update excel sheets again manually. However, we did not have anyone around who could code in python. Then I thought, why can’t I write one myself using ChatGPT? 👀
2. What do I need to prepare during this stage?:
Download the standard compilers for Python. Here I am using Pycharm CE.
Figure1: click the download page on pycharm website
Install third party libraries and packages required for crawlers.
There are many third party libraries that may be used here, you don’t need to install them all, the recommended ones are: requests, openpyxl, BeautifulSoup, os, etc.
pip3 install requests
pip3 install openpyxl
...
3. Using ChatGPT to generate code:
Breaking down your requirements, I would put the prompt in the code box below.
Can you help me with a python code that reads data from a cell in an Excel called CSGO earnings table from a folder on the mac desktop and uses that data to search on a specific website and write the crawled search data to a specific cell in the Excel table?
ChatGPT prompt
import openpyxl
import requests
from bs4 import BeautifulSoup
# Path to the Excel file
excel_file_path = "/Users/your_username/Desktop/folder/CSGO_earnings.xlsx"
# Open the workbook and select the worksheet
workbook = openpyxl.load_workbook(excel_file_path)
worksheet = workbook["Sheet1"] # Replace "Sheet1" with the actual sheet name
# Get the data from a specific cell
cell_value = worksheet["A1"].value # Replace "A1" with the desired cell reference
# Perform the search on the website
search_url = "https://www.example.com/search?q=" + cell_value # Replace with the actual search URL
response = requests.get(search_url)
# Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Extract the desired data from the website (example: title)
search_results = soup.select(".search-results") # Replace with the appropriate CSS selector
search_data = search_results[0].text.strip() if search_results else "No results found"
# Write the crawled search data to a specific cell
worksheet["B1"].value = search_data # Replace "B1" with the desired cell reference
# Save the changes to the Excel file
workbook.save(excel_file_path)
# Close the workbook
workbook.close()
The solution ideas provided by ChatGPT are certainly different each time, but if you look carefully it is not difficult to find the parts that you can modify yourself.
For instance, make sure to replace “/Users/your_username/Desktop/folder/CSGO_earnings.xlsx” with the actual path to your Excel file, “Sheet1” with the name of your worksheet, and update the search URL and CSS selector according to the specific website you are working with.
After replacing the content in the template, we can try to debug it in pycharm.
We can know the key error is Sheet1 does not exist. So we can open “CSGO_earnings.xlsx” to check it. After doing that, we found that we need to type a space between “sheet” and “1”. So we have finished our debug process.
If you do not know how to debug it yourself, you can copy the error and paste it into the ChatGPT prompt box. I believe that ChatGPT will fix it for you.
4. Developing your code:
If you want to realize more functions, try adding some requirements based on the original prompt.
For instance, I just asked ChatGPT to search data for one specific cell in my pervious prompt. Now, I want to achieve this function for more cells. So I can add some prompts,
Can you help me with a python code that reads the data in cell A2 from an Excel called CSGO Income Statement in a folder on the mac desktop and uses this data to search on a specific website and crawl the search results page for product prices and write the result data to cell J2 in the Excel table and save the file?
Developing ChatGPT prompt
import openpyxl
import requests
from bs4 import BeautifulSoup
import os
# set the path to the Excel file
desktop_path = os.path.join(os.path.expanduser("~"), "Desktop")
file_path = os.path.join(desktop_path, "CSGO earnings table.xlsx")
# load the Excel file and select the worksheet
wb = openpyxl.load_workbook(file_path)
ws = wb.active
# loop through the rows in column A
for i, row in enumerate(ws.iter_rows(min_row=2, max_row=20, max_col=1, values_only=True), start=1):
# get the search term from the current row in column A
search_term = row[0]
if search_term is None or search_term.strip() == "":
# stop the program from running
print(f"Cell A{i + 1} is empty. Stopping program.")
break
# set the URL for the website and add the search term
url = "https://www.c5game.com/csgo?marketKeyword=" + search_term
# send a GET request to the website
response = requests.get(url)
# parse the HTML content of the response using BeautifulSoup
soup = BeautifulSoup(response.content, "html.parser")
# find the product prices on the search results page
price = soup.find("div", {"class": 'price'})
# write the price to the corresponding row in column J
if price is not None:
ws.cell(row=i + 1, column=10).value = price.text
print(price.text)
else:
ws.cell(row=i + 1, column=10).value = "N/A"
# save the Excel file
wb.save(file_path)
Although this is only a further implementation of the crawler function for the content of one column of the table, there are many refinements that could be made.