Blocking Remote Web Proxy Servers - June 17, 2007

First Post June 17, 2007, Thanks to David Pickett for providing a starting list of remote web proxy servers.

Background:

If you are using a proxy server with some sort of access rules (BorderManager or other), and especially if you are in a school environment, you probably have people trying to get around your access rules by pointing their browsers to a remote proxy server. You may be blocking some site (www.sex.com, for example), but if a browser is tunneling a request through an unblocked remote proxy server, that traffic is not seen by systems like BorderManager, which look at the destination URL rather than content. Even if you have a system that looks at content, that content would be obscured if it is using SSL to encrypt the data between the web browser and the remote proxy server.

What can you do? You can try to identify all the remote proxy servers out there and block access to them. If you are using a blocking program like LinkWall (from www.connectotel.com) or SurfControl (www.surfcontrol.com), your job is easier, because you can block a category for remote web proxies, and that category can be automatically updated as new proxies are found.

If you do NOT have such a blocking program, or even if you do, you can try using the list contained below. I invite people to email me (contact details are in my consulting contact details web page) with additions or corrections, and I will attempt to keep this list updated.

Locating Problem Sites

I'm reposting some comments made in the Novell Public Forums that some people used to identify problem web sites.

Checking Log Files

David Pickett posts:

"I hoping maybe some of us can consolidate ideas/solutions to the evergrowing website proxy site to bypass filters. I have been monitoring and inputing new rules and these sites come up. I find using the proxy cache monitor NWadmin very effective on finding the commonly used sites and I also use BRDstats [available from tip #21 at this web site] to create the HTML statistics logs of every 10meg log file I have. we currently do not use a 3rd party content filter, and so therefor it's very hard to keep up the paces with these new sites that come up. I noticed that many of the sites are now providing automatic email of new proxy sites to help keep ahead of the filters."

Downloading a Blacklist

Walt Keener responded:

"I'm the network admin for a small to medium sized school district and I've also recently been running in to this. The students are constantly coming up with new proxy sites.

As an add-on to BM we also have the Connectotel Linkwall suite. This suite allows us to then use the site http://www.urlblacklist.com/ to periodically update the rules. They keep up fairly well, but I still end up adding quite a few sites manually myself and I watch the activity and see students finding new ones. You can download their black list for free if you'd like."

Effectively Analyzing Log Files For Suspicious Patterns

Daniel Griswold responded:

[Craig note: grep is a well-known linux tool for easily pulling data out of a file based on some search pattern. If you have a linux host, you already have grep. If you want to use grep in Windows, you can download the free Cygwin program, which provides many linux commands on a Windows host.]

"I have found an effective way to locate proxy servers using Bordermanager logs. Cygwin provides a Win32 port of the GNU grep pattern matching command.

What I have found is 95% of our web traffic is GET requests based on a user typing in a URL or clicking a link. When a user fills in a form/field and submits, it is a POST request. The user enters a URL into a webproxy and then POSTs the value to the server.

One caveat is the search engine. Users are constantly POSTing to these sites. The grep -v parameter specifies a pattern match to exclude.

grep POST logfile.log | grep -v google.com | grep -v yahoo.com | grep -v ask.com > newfile.log

The result is a log file of URLs that are POSTed to. I then look for URLs that appear to be random. (i.e. /cgi-bin/1.php?=8392ksudowUJSD98wyh3sd87SJDHEused89usU2Je39slf ) The pseudo-random string is an encoded URL to pass through the filter.

A second pattern that I grep for is c.myspace.com. Even though a proxy is used, something in MySpace hard references that URL. (It may be a javascript that is not being stripped.) Since we block *.myspace.com/* via ACL, I know that an request for c.myspace.com was obtained via proxy. What are the chances a student would enter c.myspace.com as a URL?

A third pattern that I grep for is proxy itself. Be careful when you view this report because legitimate websites often use proxy in the URL. ESPN uses proxy.espn.com extensively, but it is not an anonymizer.

I have also subscribed to the mailing list a www.peacefire.org. When a new proxy site is added using their software, they email the URL to the mailing list.

Rather than posting the proxy list that we use, I have copied the file to my web server.

http://www.nsd.k12.mi.us/~admin/proxy3pty.txt
http://www.nsd.k12.mi.us/~admin/proxybrdr.txt"

BorderManager Access Rules

David Pickett started his campaign to block remote proxies by first setting up Deny URL rules for the following URL's, with wildcards. Note that a Deny URL rule CANNOT be used with HTTPS sites - you must block port 443 with a Deny Port rule to deny SSL sites.

1. Deny URL Rules

2. Block Domains by Port Number

Here is a list of domains you can use to block by port 80 AND 443. You will need one Deny Port rule for each. You will have to split this list into multiple sections, because BorderManager has a 1024k limit for the amount of text you can store in a rule. (This means that you will have at least two rules per port number for the amount of data below).

Note: This list contains more than just remote proxies - it was started from a list of blocked sites and includes some sex sites and some dating sites.

Please email me or post feedback in the Novell Public Forums, BorderManager sections, if you have corrections, additions or suggestions for the above.


Return to the Main Page