Teacher’s Challenge: Get and Rearrange Lists from HTML

For this challenge we had to get a list of words from a link and print them back to the user in the order “course”, “price”.

Link to download: https://mega.nz/#F!GBtURCJT

[code lang=”py”]

”’program reads a webpage defined by the instructor and then prints a list of values found in a determined order.”’

#imports library that access online pages
import urllib.request

#allocates url destiny into a variable (remember to ask instructor of what type is this variable) and then reads the rawhtml and allocates it into a string variable
webpage = urllib.request.urlopen("https://dl.dropboxusercontent.com/u/18951105/teste.html")
sHtmlRaw = webpage.read().decode("utf-8")

#creates a list that will receive all items
finalList = []

#counts how many <li> items are there in the webpage
iListsQty = sHtmlRaw.count("<li>")

for i in range (0,iListsQty):

#appends final result when i is an even number
if i%2 == 0:
#sets initial index values according to <li> and </li> positions
iWordsStartingIndex = sHtmlRaw.find("<li>")
iWordsEndingIndex = sHtmlRaw.find("</li>")

#finds and appends to list a set of words between list tags positions
sWord = sHtmlRaw[iWordsStartingIndex+4:iWordsEndingIndex]
finalList.append(sWord)

#finds and removes entire line along with list tags
sAllLine = sHtmlRaw[iWordsStartingIndex:iWordsEndingIndex+5]
sHtmlRaw = sHtmlRaw.replace(sAllLine, " ")

#appends final result when i is an odd number
else:
#sets initial index values according to R$ and 00</li> positions – since prices always start with R$ and end with 00
iWordsStartingIndex = sHtmlRaw.find("R$")
iWordsEndingIndex = sHtmlRaw.find("00</li>")

#finds and appends to list a set of words between list tags positions
sWord = sHtmlRaw[iWordsStartingIndex:iWordsEndingIndex+2]
finalList.append(sWord)

#finds and removes entire line along with list tags
sAllLine = sHtmlRaw[iWordsStartingIndex-3:iWordsEndingIndex+7]
sHtmlRaw = sHtmlRaw.replace(sAllLine, " ")

print (finalList)

[/code]

Leave a Reply

Your email address will not be published. Required fields are marked *