Category: Technical Guide

Download PDF

Linewize is a provider of next generation firewall management focusing on application level visibility, classroom controls and advanced content filtering while maintaining student data privacy

Share this content:
    

Introduction to VPN based filter avoidance

For as long as network firewalls have existed, network engineers and vendors have been locked in a cat and mouse game against the creators of malware and filtering avoidance applications. Internet filtering vendors are constantly on the backfoot in keeping abreast of such providers.

The traditional vendor approach is to constantly decompile and dissect malware and filtering avoidance techniques, looking to find implementation quirks that can identify this malicious traffic. These quirks are then wrapped up in a network signature that uses specific protocol and packet criteria to identify and block unwanted network activity. 

At the same time, the malware creators are doing the opposite, constantly evolving their software to get around firewalls and masquerade as innocuous network traffic. 

In the past 2 years, Linewize has seen an increase in adoption of filtering avoidance software like VPNs from 2% of users to 10%-40% of users in some educational institutions. This widespread adoption of filtering avoidance systems has grown largely unnoticed but has now reached almost epidemic levels, to that extent that in some schools such behaviour has become normalised amongst students.

Educational institutions looking to uphold their duty of care need to make sure that students are not able to use filtering avoidance technology to bypass the school’s filtering systems and access inappropriate content. This is traditionally performed using an application aware firewall, however even firewalls that utilise Deep Packet Inspection (DPI) and secure content (SSL) inspection cannot keep pace with the rapid pace of change in avoidance techniques, thereby rendering them no longer fit for purpose.

 

Traditional Technologies in use for Filtering Avoidance

Traditional approaches to filtering avoidance generally come in two flavors and they both exhibit easy patterns that can be identified by a network firewall.

In-browser Proxy Servers

In-browser servers offer users an easy method of viewing websites through the browser that would otherwise be filtered. By visiting a website, users are presented with a browser inside a browser. Here, at the cost of being subjected to aggressive advertising and substantial malware infection risk, users can enter the URL of another website and their traffic is tunneled through the root website’s servers.

 

Inbrowser Proxy.png

 

By using this type of in-browser proxy, users are subjected to many risks to privacy as well as infection. As traffic is encapsulated in another website, the authors of this website have complete domain over any content delivered and HTTPS provides no guarantee of privacy.

Fortunately for educational institutions, browser based proxies are the easiest to identify and block. These sites can be identified using standard URL based filtering and are easily tracked and identified.

Virtual Private Networks (VPNs) and Proxy Server Tunnels

The second common type of filter avoidance software is the traditional tunnel. Tunnels are a technology that has many use cases, including granting road warriors remote access to internal network services.

In schools, tunnels can be used to encapsulate internet traffic and route it through a remote server in a fashion that is encrypted and unable to be identified by traditional network content filters and firewalls.

 

Tunnel2.png

 

There are several mainstream VPN and proxy servers that provide this functionality, including OpenVPN and Squid Proxy. Users of these services can create their own tunnel servers at home or on cloud provided hardware like Amazon AWS. Then by using a client or configuring their browser, users route their traffic through the tunnel and the traffic content is encrypted.

Given that the central use of this technology is for security purposes rather than filter avoidance this approach is remarkably easy to identify either at the OSI Layer 4 using ports or with signatures employed with Layer 7 DPI technologies.

 

A new wave of filter avoidance technologies

As outlined previously, both in-browser proxy servers and tunnel based technologies are easy to identify and filter. To counter this the filter avoidance providers have developed a new set of techniques, focused solely on developing tunneling technology that is exceptionally hard to block.

Oppressive governments that seek to restrict free internet access combined with an online population that have become increasingly concerned with privacy are driving this arms race. Even customers of media companies such as Netflix are adopting this technology to access content that is not available in their home country are generating demand for VPN service providers.

This new filter avoidance technology is mostly referred to as VPNs and is mostly app based. Looking in the Apple App Store, you will find hundreds of these systems available for easy download and purchase. The big difference between these systems and what was discussed in the previous section is that these systems masquerade as normal traffic. From a firewall’s point of view, the traffic coming from these devices looks like normal safe internet traffic.

VPN Masquerading.png

 

Vendors like Hotspot Shield and Ultrasurf go to extreme lengths to encapsulate users traffic in a form that masquerades as normal internet encrypted HTTPS traffic. By doing this, firewalls that are employing DPI techniques will see what looks like normal traffic and not be able to distinguish between valid traffic and tunneled VPN traffic.

Some vendors claim to be able to protect against this type of masquerading by deploying client side certificates to allow for inspecting the HTTPS traffic. What they do not mention is that these VPNs often do not utilize the certificate authorities that are normally installed on client devices, so employing a Man-in-the-Middle (MITM) attack will not identify this traffic. Couple this with the cost of installing client side certificates on BYOD devices and educational institutions are left in the dark once again.

This approach of masquerading as normal traffic in itself is brilliant, but it does not end there. With the advent of Software Defined Networking (SDN) and programmatically accessible cloud platforms like Amazon AWS and Microsoft Azure it is now possible to automatically move and acquire new endpoints on an hourly basis. An entry level developer can easily deploy VPN endpoints to new servers and release old ones when firewall vendors identify them.

This pushes the cat and mouse game to a new level of intensity that leaves traditional vendors and their quarterly signature updates left far behind and in no position to effectively respond to these techniques.

 

Using machine learning to identify VPNs

The constantly changing landscape of VPNs and filtering avoidance technologies has lead Linewize to investigate new techniques of identification in combination with the traditional approach of creating DPI signatures. These new techniques involve using machine learning and automated statistical analysis to identify new VPN endpoints on a minute by minute basis.

Machine learning and statistical analysis is the art of looking for patterns in large datasets that can identifies common groups of behavior and isolate a specific target. To construct our dataset, Linewize aggregates network traffic meta-data collected across hundreds of networks summarising terabytes of network traffic. Within this dataset Linewize looks for behavioral patterns of traffic from filtering avoidance systems and malware.

A key differentiator of Linewize network management services is the comprehensive cloud based reporting capabilities. Linewize provides detailed aggregated and individualized reports for both real-time and historical network use. Linewize has built a cloud platform that is optimized for performing fast and detailed queries against network datasets that contain terabytes of network traffic metadata. This platform enables the application of pattern analysis and machine learning to identify malicious traffic.

Pattern recognition applied to network traffic

Several statistical and network techniques have been combined to successfully identify target traffic. Specific implementation details are proprietary but the following provides an outline of the high level approach used to identify VPN traffic.

Abnormal relationships

As unique as humans claim they are, the reality is that their behaviour online is not so unique after all. When you start looking at browsing behavior, common trends linked to personas can be identified. Let’s look at an example:

Student 1

Student 2

Student 3

facebook.com

youtube.com

facebook.com

youtube.com

facebook.com

emirates.com

nzherald.co.nz

stuff.co.nz

turkiye.gov.tr

Mathletics (app)

Khan Academy (app)

stuff.co.nz

iTunes (app)

Mathletics (app)

paypal.com

Coursera (app)

TedEx (app)

dogs.info

 

Looking at this dataset, it’s remarkably easy for a human to identify which user is different. Student 1 and Student 2 have both been using Youtube, Facebook and some educational and news related websites. Student 3 is a bit different, he or she has also been using Facebook and appears to have visited stuff.co.nz, a common news related website, however they also been visiting some websites that are less common for student access.

This in itself, might be reason for a school counselor or administrator to have a chat with the student, but if we dig a little deeper it gets more interesting.

If we extend our profiling to 10,000 students rather than just 3 and start looking at these patterns we might find that it’s very uncommon for a student that is using Facebook, Youtube and Coursera to visit emirates.com. Separately it also seems common among students visiting emirates.com to also visit turkiye.gov.tr and paypal.com. If we pivot the dataset a little, and look at data volumes we can see a closer pattern emerge.

Let's focus our dataset around paypal.com. 

Student 1

Data Volume

Student 2

Data Volume

Student 3

Data Volume

paypal.com 

10M

paypal.com  

500M

paypal.com 

1.2G

amazon.com

20M

emirates.com 

1.2G

emirates.com 

500M

stuff.co.nz 

10M

turkiye.gov.tr 

20M

turkiye.gov.tr 

1.4G

trademe.co.nz 

5M

stuff.co.nz 

1.5G

stuff.co.nz 

700M

iTunes (app)

1.2G

edxio.info 

5M

paypal.com 

12M

pbtech.co.nz 

15M

dogs.info 

10M

dogs.info 

800M

 

From this dataset, it’s easy to see that something is wrong. Student 1 has visited paypal.com and some other shopping related websites has transferred relatively tiny amounts of data to these sites. In comparison student 2 and 3 have no pattern of visiting other shopping sites that could be related to paypal and show huge amounts of data transfer.

For a human, the pattern is clear on a small dataset. For a computer, these sorts of trends can be identified at huge scale. This sort of analysis is great for identifying users that are unusual but more importantly can be combined with other techniques to identify VPNs and malware masquerading as something it's not.

Common denominators between unlikely partners

Malware and filter avoidance applications typically tend to share infrastructure that would otherwise be unlikely to be shared. This rule can be used to build on other identifiers and map out infrastructure that VPN providers are using at a rapid pace.

When it is identified that dogs.info is a VPN or malware control bot then it can be inferred that other traffic going to IP addresses related to the same domain will also part of the VPN network.

Dogs.info -> 11.2.2.2 <- cats.info

These types of inferred relationships can be used to train identification engines and pre-emptively block malicious traffic.

High levels of unidentified traffic

Another technique for identifying suspicious behavior is rooted in the fact that Linewize identifies 98% of traffic and uses real time updates of classification information to stay ahead of new applications and websites.

With such a high probability that most traffic will be classified, looking for such traits as users with a high level of unidentified traffic proves to be rapid way of training our machine learning to identify users that are utilizing filtering avoidance.

If Student 1 is active but traffic is going to only one IP address, lets say registered to an ISP in Russia, then this is an indicator that something is wrong. If we then search our network flow database for other users with traffic to that IP, we might find other users that also appear to be only transferring data to this IP address. By adding a little probability, we can be almost certain that this IP address is part of a tunnel VPN network and this can be blocked immediately for all appliances.

Rapid analysis and Constant Updates

Linewize appliances are in constant contact with the cloud platform and constantly receive updates sometimes on a minute by minute basis. A key differentiator between Linewize and traditional vendors is that we can use machine learning techniques to ensure our appliances are able to identify and filter content as it emerges and changes online. At scale, this means that even if your network is not subjected to VPN or filter avoidance right now, you will still benefit from Linewize’s ability to identify and block this content should someone emerge on your network with this technology installed.

The Solution

As can be inferred from the information above, even with machine learning techniques, identifying and filtering VPNs and malware is still a cat and mouse game. However, with automatic identification and machine learning and datasets spanning thousands of networks it is possible to stay pinned to the mouse's tail.

Linewize is able to quickly and automatically identify new traffic types and block elusive VPNs like Hotspot Shield and Ultrasurf effectively. By combining real-time updates and machine learning technologies, Linewize provides visibility and control over unwanted VPN usage on your network.

Interested in learning more about our tools?
Share this content:
Book Demo
    
As Recognised By