First Post June 17, 2007, Thanks to David Pickett for
providing a starting list of remote web proxy servers.
If you are using a proxy server with some sort of access rules
(BorderManager or other), and especially if you are in a school environment,
you probably have people trying to get around your access rules by pointing
their browsers to a remote proxy server. You may be blocking some site
(www.sex.com, for example), but if a browser is tunneling a request through
an unblocked remote proxy server, that traffic is not seen by systems like
BorderManager, which look at the destination URL rather than content. Even if
you have a system that looks at content, that content would be obscured if it
is using SSL to encrypt the data between the web browser and the remote proxy
server.
What can you do? You can try to identify all the remote proxy servers out
there and block access to them. If you are using a blocking program like
LinkWall (from www.connectotel.com) or SurfControl (www.surfcontrol.com), your
job is easier, because you can block a category for remote web proxies, and
that category can be automatically updated as new proxies are found.
If you do NOT have such a blocking program, or even if you do, you can try
using the list contained below. I invite people to email me (contact details
are in my consulting contact details web page) with additions or corrections,
and I will attempt to keep this list updated.
I'm reposting some comments made in the Novell Public Forums that some people used to identify problem web sites.
David Pickett posts:
"I hoping maybe some of us can consolidate ideas/solutions to the evergrowing
website proxy site to bypass filters. I have been monitoring and inputing new
rules and these sites come up. I find using the proxy cache monitor NWadmin
very effective on finding the commonly used sites and I also use BRDstats
[available from tip #21 at this web site] to create the HTML statistics logs of
every 10meg log file I have. we currently do not use a 3rd party content
filter, and so therefor it's very hard to keep up the paces with these new
sites that come up. I noticed that many of the sites are now providing
automatic email of new proxy sites to help keep ahead of the filters."
Walt Keener responded:
"I'm the network admin for a small to medium sized school district and I've
also recently been running in to this. The students are constantly coming up
with new proxy sites.
As an add-on to BM we also have the Connectotel Linkwall suite. This suite
allows us to then use the site http://www.urlblacklist.com/ to periodically
update the rules. They keep up fairly well, but I still end up adding quite
a few sites manually myself and I watch the activity and see students finding
new ones. You can download their black list for free if you'd like."
Daniel Griswold responded:
[Craig note: grep is a well-known linux tool for easily pulling data out of a
file based on some search pattern. If you have a linux host, you already have
grep. If you want to use grep in Windows, you can download the free
Cygwin program, which provides many linux commands on a Windows host.]
"I have found an effective way to locate proxy servers using Bordermanager logs.
Cygwin provides a Win32 port of the GNU grep pattern matching command.
What I have found is 95% of our web traffic is GET requests based on a user
typing in a URL or clicking a link. When a user fills in a form/field and
submits, it is a POST request. The user enters a URL into a webproxy and then
POSTs the value to the server.
One caveat is the search engine. Users are constantly POSTing to these sites.
The grep -v parameter specifies a pattern match to exclude.
grep POST logfile.log | grep -v google.com | grep -v yahoo.com | grep -v
ask.com > newfile.log
The result is a log file of URLs that are POSTed to. I then look for URLs that
appear to be random. (i.e. /cgi-bin/1.php?=8392ksudowUJSD98wyh3sd87SJDHEused89usU2Je39slf )
The pseudo-random string is an encoded URL to pass through the filter.
A second pattern that I grep for is c.myspace.com. Even though a proxy is
used, something in MySpace hard references that URL. (It may be a javascript
that is not being stripped.) Since we block *.myspace.com/* via ACL, I know
that an request for c.myspace.com was obtained via proxy. What are the chances
a student would enter c.myspace.com as a URL?
A third pattern that I grep for is proxy itself. Be careful when you view this
report because legitimate websites often use proxy in the URL. ESPN uses
proxy.espn.com extensively, but it is not an anonymizer.
I have also subscribed to the mailing list a www.peacefire.org. When a new
proxy site is added using their software, they email the URL to the mailing
list.
Rather than posting the proxy list that we use, I have copied the file to
my web server.
http://www.nsd.k12.mi.us/~admin/proxy3pty.txt
http://www.nsd.k12.mi.us/~admin/proxybrdr.txt"
David Pickett started his campaign to block remote proxies by first setting up Deny URL rules for the following URL's, with wildcards. Note that a Deny URL rule CANNOT be used with HTTPS sites - you must block port 443 with a Deny Port rule to deny SSL sites.
Here is a list of domains you can use to block by port 80 AND
443. You will need one Deny Port rule for each. You will have to split
this list into multiple sections, because BorderManager has a 1024k limit for
the amount of text you can store in a rule. (This means that you will have
at least two rules per port number for the amount of data below).
Note: This list contains more than just remote proxies - it was started from
a list of blocked sites and includes some sex sites and some dating sites.
Please email me or post feedback in the Novell Public Forums, BorderManager
sections, if you have corrections, additions or suggestions for the above.