Thanks!
The program that I used was WinHTTrack Website Copier (
http://www.httrack.com). It's something I've had lying around for a while, so I couldn't speak about what other programs or script methods are better than this, but it did the job I wanted.
The address I set the program to copy everything from was
https://edhrec.com/decks/ hoping that I'd get just the basic average decks, but luckily I ended up with all the themes for each commander as well.
So I end up with a bunch of html files with the commander and theme in the file name. What I want isolated in each of them is a div with a class labelled "test_cardlistscontainer". This is the div with just the deck, and a couple "buy this deck here" links. The problem is editting thousands of files with the same regex to scrape away all the other html stuff.
I ended up using another program I had called Ecobyte Replace Text (
http://www.ecobyte.com/).
I use this to apply regex text replacements across all the files. Once I was done, I then decided to take some time and relearn a bit of python to go through all the files add the tags I need for a Forge commander deck.
The script I used is below. It mentions opening up txt files, which I did with the Bulk Rename Utility (
http://www.bulkrenameutility.co.uk), renaming all the files from htm to txt, then after from txt to dck. I'm a Windows user, and while I could be doing a lot of this in linux and bash, windows and batch files, or python if I took the time to learn it better, I would have but I also like collecting these kinds of utility apps and only using them when I need to.
The python script:
- Code: Select all
import os
for filename in os.listdir(r"C:\Users\------\Desktop\EDH Rec Average Decks"):
if filename.endswith(".txt"):
textfile = os.path.join(r"C:\Users\------\Desktop\EDH Rec Average Decks", filename)
# Get the file's name without extention
textnamebase = os.path.basename(textfile)
name = os.path.splitext(textnamebase)[0]
print(name)
newfile = open(os.path.join(r"C:\Users\------\Desktop\EDH Rec Average Decks\New folder", "new"+filename),"w")
newfile.write("[metadata]\n")
newfile.write("Name="+name+"\n")
newfile.write("[Commander]\n")
with open(textfile,"r") as file:
file.readline()
newfile.write(file.readline())
newfile.write("[Main]\n")
for line in file:
newfile.write(line)
file.close()
newfile.close()
continue
else:
continue