There are many ways to download resources from the Internet. Besides your browser, you can also use a tool like wget to download resources from the web while doing something else. In this article, we will show you how to Install and Use wget on Mac.
What is wget (And What It’s Used For)?
For the ignorant, wget is an open source non-interactive command line utility that allows you to download resources from a specific URL. Since it is not interactive, wget can work in the background or before you log in.
It’s a GNU team project and it’s great if you have a bad internet connection. This means that it is robust under otherwise suboptimal conditions.
Once you have installed wget, run commands and specify a destination for your files. We will show you how to do this below.
How to Install and Use wget on Mac
Before installing wget, you need a package manager. Although wget doesn’t come with macOS, you can download and install it using Homebrew , the best Mac package manager available.
1. Download and Install Homebrew
To install Homebrew, first open a terminal window and run the following command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
This uses the curl command to download files that come with the Ruby installation pre-installed on macOS.
As soon as you hit Enter to run the command, the installation program will give you detailed information on what to do next.
After you confirm, the installer will run.
2. Install wget From the Command Line
Next, we want to use Homebrew to Install and Use wget on Mac. From the terminal, run again:
brew install wget
The installer gives you live progress updates, and there is little you need to do here. The process is simple and automated. However, if you already have Homebrew installed, be sure to run the brew update to get the latest copies of all your formulas.
Once you see a new message in your Terminal, you can use wget on the Mac to download resources.
How to Use wget to Download Web Resources
To download a remote resource from a URL using wget, you must use the following structure:
wget -O path/to/local.copy http://example.com/url/to/download.html
This will save the file specified in the URL to the specified location on your computer.
If you exclude the
-O “flag,” your download location will the the current working directory.
For example, we would like to download a website in the Downloads folder:
wget -O /Users/[your-username]/Downloads/status.html https://www.w3.org/Status.html
Though, to do the same without the
-O flag, we’d need to change the directory (
cd downloads) before we run wget:
wget /Users/[your-username]/Downloads/status.html https://www.w3.org/Status.html
You will get the full details of the download progress, although at wget speed this information looks more like a summary of the download than real-time updates.
How to Download a Recursive Directory
To download an entire directory tree with wget, you need to use the
wget -e robots=off -r -np https://www.w3.org/History/19921103-hypertext/hypertext/
As a result, wget follows all the links found in the documents within the specified directory. From there it will do a recursive download of the entire specified URL path.
Also note the command
-e robots=off. This ignores the restrictions in the robots.txt file. In general, it is a good idea to disable the robots.txt file to avoid shortened downloads.
Using Additional Flags with wget
You will find that wget is a flexible tool because it uses a number of other additional indicators. This is ideal if you have specific requirements for your download.
Let’s take a look at two areas of our focus on controlling the download process and creating records.
Control How wget Will Download Resources
There are many flags to help you configure the download process. Here are just a few of the most useful:
wget -X /absolute/path/to/directorywill exclude a specific directory on the remote server.
wget -nHremoves the “hostname” directories. In other words, it skips over the primary domain name. For example, wget would skip the
www.w3.orgfolder in the previous example and start with the
wget --cut-dirs=#skips the specified number of directories down the URL before starting to download files. For example,
-nH --cut-dirs=1would change the specified path of “ftp.xemacs.org/pub/xemacs/” into simply “/xemacs/” and reduce the number of empty parent directories in the local download.
wget -R index.html/
wget --reject index.htmlwill skip any files matching the specified file name. In this case, it will exclude all the index files. The asterisk (*) is a wildcard, such as “*.png”. This would skip all files with the PNG extension.
wget -i filespecifies target URLs from an input file. This input file must be in HTML format, or you’ll need to use the
--force-htmlflag to parse the HTML.
wget --no-clobberwill not overwrite files that already exist in the destination.
wget --continuewill continue downloads of partially downloaded files.
wget -t 10will try to download the resource up to 10 times before failing.
wget can do more than just control the download process, as it can also create logs for future reference.
Adjust the Level of Logging
You can also think of the flags below as a partial way to control the output you get when you use wget.
wget -denables debugging output.
wget -o path/to/log.txtenables logging output to the specified directory instead of displaying the log-in standard output.
wget -qturns off all of wget’s output, including error messages.
wget -vexplicitly enables wget’s default of verbose output.
wget --no-verboseturns off log messag\es but displays error messages.
You often want to know what happens during a download, so you may not use these indicators as often as others. However, if you have a large number of downloads and want to make sure you can fix any issues, a missing record or result is a valid approach.
While you can use your browser or some other GUI to download web pages and other resources, the command line can save you time. A tool like wget is powerful, more than your browser, and also agile.