It’s called scraping and can be used for good or evil
Web scraping is the act of extracting data, information, or images from a website using an automated method. Think of it as copy and paste on full automatic.
We either write or use an app to go to the websites we want it to and make a copy of the specific things we want from those websites. It’s much more precise than downloading an entire website.
Like any tool, web scraping can be used for good or evil.
Some of the better reasons for scraping websites would be ranking it in a
search engine based on its content, price comparison shopping, or monitoring
stock market information. You might even use it as a research tool of sorts.
How Can I Scrape Websites with Excel?
Believe it or not, Excel has had the ability to extract data
from websites for a long time, at least since Excel 2003. It’s just that web
scraping is something most people don’t think of, let alone think of using a
spreadsheet program to do the job. But it’s surprisingly easy, and powerful.
Let’s learn how it’s done by making a collection of Microsoft Office keyboard
Find the Sites You Want to Scrape
The first thing we’re going to do is find the specific web pages from which we want to get information. Let’s go to the source and search at https://support.office.com/. We’re going to use the search term “frequently used shortcuts”. We can make it more specific by using the name of the specific app, like Outlook, Excel, Word, and so on. It may be a good idea to bookmark the results page so we can easily get back there.
Click on the search result, “Keyboard shortcuts in Excel for
Windows”. Once on that page, find the list of Excel versions and click on Newer Versions. Now we’re working with
the latest and greatest.
We could go back to our search results page and open the
results for all the other Office apps in their own tabs and bookmark them. It’s
a good idea, even for this exercise. This is where most people would stop in
collecting Office shortcuts, but not us. We’re going to put them in Excel so we
can do whatever we want with them, whenever we want.
Open Excel and Scrape
Open Excel and start a new workbook. Save the workbook as Office Shortcuts. If you have OneDrive,
save it there so the AutoSave feature
Once the workbook is saved, click on the Data tab.
In the ribbon of the Data tab, click on From Web.
The From Web wizard
window will open. This is where we put the web address or URL of the website
from which we want to scrape data. Switch to your web browser and copy the URL.
Paste the URL in to the URL
field of the From Web wizard. We could choose to use this in Basic or Advanced mode. The Advanced mode gives us a lot more options on how
to access the data from the website. For this exercise, we only need Basic
mode. Click OK.
Excel will now attempt to connect to the website. This may
take a few seconds. We’ll see a progress window, if it does.
window will open, and we’ll see a list of tables from the website on the left.
When we select one, we’ll see a table preview on the right. Let’s select the Frequently used shortcuts table.
We can click on the Web
View tab to see the actual website, if we need to look around for the table
we want. When we find it, we can click on it and it will be selected for
Now, we click on the Load
button at the bottom of this window. There are other options we could choose,
which are more complex and beyond the scope of doing our first scrape. Just be
aware that they are there. Excel’s web scraping capabilities are very powerful.
The web table will be loaded in Excel after a few seconds.
We’ll see the data on the left, where the number 1 is in the picture below. Number 2 highlights the Query
used to get the data from the website. When we have multiple queries in a work
book, this is where we select the one we need to use.
Notice that the data comes into the spreadsheet as an Excel
table. It’s already set up for us to be able to filter or sort the data.
We can repeat this process for all the other web pages that
have the Office shortcuts that we want for Outlook, Word, Access, PowerPoint,
and any other Office app.
Keeping Scraped Data Current in Excel
As a bonus for you, we’re going to learn how to keep our
scraped data fresh in Excel. This is a great way to illustrate just how
powerful Excel is for data scraping. Even with this, we’re only doing the most
basic scraping that Excel can do.
For this example, let’s use a stock information web page like https://www.cnbc.com/stocks/.
Go through what we did before and copy and paste the new URL from the address bar.
You’ll get to the Navigator window and see the tables available. Let’s select the Major U.S. Stock Indices.
Once the data is scraped we’ll see the following
On the right, we see the query for Major U.S. Stock Indexes.
Select that so it is highlighted. Make sure we’re in the Table Tools tab and in the Design
area. Then click on the down arrow under Refresh.
Then click on Connection Properties.
In the Query
Properties window, under the Usage
tab, we can control how this information refreshes. We can set a specific time
period to refresh, or to refresh when we open the workbook the next time, or to
refresh in the background, or any combination of these. Once we choose what we
need, click on OK to close the
window and continue.
That’s it! Now you can track stock prices, sports scores, or any other data that changes frequently from an Excel spreadsheet. If you’re good with Excel equations and functions, you can do almost anything you want with the data.
Maybe try to identify stock trends, run a fantasy sports pool at work, or maybe just keep track of the weather. Who knows? Your imagination and the data available on the Internet, are the only limits.