How to Iden­tify Fum­bling to Keep a Net­work Se­cure

Repeated sys­tem­atic failed at­tempts by a host to ac­cess re­sources like a URL, an IP ad­dress or an email ad­dress is known as fum­bling. Er­ro­neous at­tempts to ac­cess re­sources by le­git­i­mate users must not be con­fused with fum­bling. Let’s look at how we can d

OpenSource For You - - Contents - By: Di­pankar Ray The au­thor is a mem­ber of IEEE and IET, and has more than 20 years of ex­pe­ri­ence in open source ver­sions of UNIX op­er­at­ing sys­tems and Sun So­laris. He is presently work­ing on data anal­y­sis and ma­chine learn­ing us­ing a neu­ral net­work as we

Net­work se­cu­rity im­ple­men­ta­tion mainly de­pends on ex­ploratory data anal­y­sis (EDA) and vi­su­al­i­sa­tion. EDA pro­vides a mech­a­nism to ex­am­ine a data set with­out pre­con­ceived as­sump­tions about the data and its be­hav­iour. The be­hav­iour of the In­ter­net and the at­tack­ers is dy­namic and EDA is a con­tin­u­ous process to help iden­tify all the phe­nom­ena that are cause for an alarm, and to help de­tect anom­alies in ac­cess to re­sources.

Fum­bling is a gen­eral term for repeated sys­tem­atic failed at­tempts by a host to ac­cess re­sources. For ex­am­ple, le­git­i­mate users of a ser­vice should have a valid email ID or user iden­ti­fi­ca­tion. So if there are nu­mer­ous at­tempts by a user from a dif­fer­ent lo­ca­tion to tar­get the users of this ser­vice with dif­fer­ent email iden­ti­fi­ca­tions, then there is a chance that this is an at­tack from that lo­ca­tion. From the data anal­y­sis point of view, we say a fum­bling con­di­tion has hap­pened.

This in­di­cates that the user does not have ac­cess to that sys­tem and is ex­plor­ing dif­fer­ent pos­si­bil­i­ties to break the se­cu­rity of the tar­get. It is the task of the se­cu­rity per­son­nel to iden­tify the pat­tern of the at­tack and the mis­takes com­mit­ted to dif­fer­en­ti­ate them from in­no­cent er­rors. Let’s now dis­cuss a few ex­am­ples to iden­tify a fum­bling con­di­tion.

In a nut­shell, fum­bling is a type of In­ter­net at­tack, which is char­ac­terised by fail­ing to con­nect to one lo­ca­tion with a sys­tem­atic at­tack from one or more lo­ca­tions. Af­ter a brief dis­cus­sion of this type of net­work in­tru­sion, let’s con­sider a prob­lem of net­work data anal­y­sis us­ing R, which is a good choice as it pro­vides pow­er­ful sta­tis­ti­cal data anal­y­sis tools to­gether with a graph­i­cal vi­su­al­i­sa­tion op­por­tu­nity for a bet­ter un­der­stand­ing of the data.

Fum­bling of the net­work and ser­vices

In case of TCP fum­bling, a host fails to reach a tar­get port of a host, whereas in the case of HTTP fum­bling, hack­ers fail to ac­cess a tar­get URL. All fum­bling is not a net­work at­tack, but most of the sus­pi­cious at­tacks ap­pear as fum­bling.

The most common rea­son for fum­bling is lookup fail­ure which hap­pens mainly due to mis­ad­dress­ing, the move­ment of the host or due to the non-ex­is­tence of a re­source. Other than this, an au­to­mated search of des­ti­na­tion tar­gets, and scan­ning of ad­dresses and their ports are pos­si­ble causes of fum­bling. Some­times, to search a tar­get host, au­to­mated mea­sures are taken to check whether the tar­get is up and run­ning. These types of failed at­tempts are gen­er­ally mis­taken for net­work at­tacks, though lookup fail­ure hap­pens ei­ther due to mis­con­fig­u­ra­tion of DNA, a faulty re­di­rect­ion on the Web server, or email with a wrong URL. Sim­i­larly, SMTP com­mu­ni­ca­tion uses an au­to­mated net­work traf­fic con­trol scheme for its des­ti­na­tion ad­dress search.

The most se­ri­ous cause of fum­bling is repeated scan­ning by at­tack­ers. At­tack­ers scan the en­tire ad­dress­port com­bi­na­tion ma­trix ei­ther in vertical or in horizontal di­rec­tions. Gen­er­ally, at­tack­ers ex­plore hor­i­zon­tally, as they are most in­ter­ested in ex­plor­ing po­ten­tial vul­ner­a­bil­i­ties. Vertical search is ba­si­cally a de­fen­sive ap­proach to iden­tify an at­tack on an open port ad­dress. As an al­ter­na­tive to scan­ning, at times at­tack­ers use a hit-list to ex­plore a vul­ner­a­ble sys­tem. For ex­am­ple, to iden­tify SSH host, at­tack­ers may use a blind scan and then start a pass­word at­tack.

Iden­ti­fy­ing fum­bling

Iden­ti­fy­ing ma­li­cious fum­bling is not a triv­ial task, as it re­quires de­mar­cat­ing in­nocu­ous fum­bling from the malev­o­lent kind. Pri­mar­ily, the task of as­sess­ing failed ac­cesses to a re­source is to iden­tify whether the fail­ure is con­sis­tent or tran­sient. To ex­plore TCP fum­bling, look into all TCP com­mu­ni­ca­tion flags, pay­load size and packet count. In TCP com­mu­ni­ca­tion, the client sends an ACK flag only af­ter re­ceiv­ing the SYN+ACK sig­nal from the server. If there is no ACK af­ter a SYN from the server, then that in­di­cates a fum­bling. An­other pos­si­ble way to lo­cate a ma­li­cious at­tack is to count the num­ber of pack­ets of a flow. A le­git­i­mate TCP flow re­quires at least three pack­ets of over­head be­fore it con­sid­ers trans­mit­ting data. Most re­tries re­quire three to five pack­ets, and TCP flows hav­ing five pack­ets or less are likely to be fum­bles.

Since, dur­ing a failed con­nec­tion, the host sends the same SYN pack­ets op­tions re­peat­edly, a ra­tion of packet size and packet num­ber is also a good mea­sure of iden­ti­fy­ing TCP flow fum­bling.

ICMP in­forms a user about why a con­nec­tion failed. It is also pos­si­ble to look into the ICMP re­sponse traf­fic to iden­tify fum­bling. If there is a sud­den spike in mes­sages orig­i­nat­ing from a router, then there is a good chance that a tar­get is prob­ing the router’s net­work. A proper foren­sic in­ves­ti­ga­tion can iden­tify a pos­si­ble at­tack­ing host at­tack­ing host.

Since UDP does not fol­low TCP as a strict com­mu­ni­ca­tion pro­to­col, the eas­i­est way to iden­tify UDP fum­bling is by ex­plor­ing net­work map­ping and ICMP traf­fic.

Iden­ti­fy­ing ser­vice level fum­bling is com­par­a­tively eas­ier than com­mu­ni­ca­tion level fum­bling, as in most of the cases ex­haus­tive logs record each ac­cess and mal­func­tion.

For ex­am­ple, HTTP re­turns three-digit sta­tus codes 4xx for ev­ery client-side er­ror. Among the dif­fer­ent codes, 404 and 401 are the most common for un­avail­abil­ity of re­sources and unau­tho­rised ac­cess, re­spec­tively. Most of the 404 er­rors are in­nocu­ous, as they oc­cur due to mis­con­fig­u­ra­tion of the URL or the in­ter­nal vul­ner­a­bil­i­ties of dif­fer­ent ser­vices of the HTTP server. But if it is a 404 scan­ning, then it may be ma­li­cious traf­fic and there may be a chance that at­tack­ers are try­ing to guess the ob­ject in or­der to reach the vul­ner­a­ble tar­get. Web server au­then­ti­ca­tion is re­ally used by mod­ern Web servers. In case of dis­cov­er­ing any log en­try of an 401 er­ror, proper steps should be taken to re­move the source from the server.

An­other common ser­vice level vul­ner­a­bil­ity comes from the mail ser­vice pro­to­col, SMTP. When a host sends a mail to a non-ex­is­tent ad­dress, the server ei­ther re­jects the mail or bounces it back to the source. Some­times it also di­rects the mail to a catch-all ac­count. In all these three cases, the rout­ing SMTP server keeps a record of the mail de­liv­ery sta­tus. But the main hur­dle of iden­ti­fy­ing SMTP fum­bling comes from spam. It’s hard to dif­fer­en­ti­ate SMTP fum­bling from spam as spam­mers send mail to ev­ery con­ceiv­able ad­dress. SMTP fum­blers also send mails to tar­get ad­dresses to ver­ify whether an ad­dress ex­ists for pos­si­ble scout­ing out of the tar­get.

De­sign­ing a fum­bling iden­ti­fi­ca­tion sys­tem

From the above dis­cus­sion, it is ap­par­ent that iden­ti­fy­ing fum­bling is more sub­jec­tive than ob­jec­tive. De­sign­ing a fum­bling iden­ti­fi­ca­tion and alarm sys­tem re­quires in-depth knowl­edge of the net­work and its traf­fic pat­tern. There are sev­eral net­work tools, but here we will cover some ba­sic sys­tem util­i­ties so that read­ers can ex­plore the in­fi­nite pos­si­bil­i­ties of de­sign­ing net­work in­tru­sion de­tec­tion and preven­tion sys­tems of their own.

In or­der to sep­a­rate ma­li­cious from in­nocu­ous fum­bling, the an­a­lyst should mark the tar­gets to de­ter­mine whether the at­tack­ers are reach­ing the goal and ex­plor­ing the tar­get. This step re­duces the bulk of data to a man­age­able state and makes the task eas­ier. Af­ter fix­ing the tar­get, it is nec­es­sary to ex­am­ine the traf­fic to study the fail­ure pat­tern. If it is TCP fum­bling, as men­tioned ear­lier, this can be de­tected by find­ing traf­fic with­out the ACK flag. In case of an HTTP scan­ning, ex­am­i­na­tion of the HTTP server log table for 404 or 401 is done to find out the ma­li­cious fum­bling. Sim­i­larly, the SMTP server log helps us to find out doubt­ful emails to iden­tify the at­tack­ing hosts.

If a scout­ing hap­pens to a dark space of a net­work, then the chance of ma­li­cious at­tack is high. Sim­i­larly, if a scan­ner scans more than one port in a given time frame, the chance of in­tru­sion is high. A ma­li­cious at­tack can be con­firmed by ex­am­in­ing the con­ver­sa­tion be­tween the at­tacker and the tar­get. Sus­pi­cious con­ver­sa­tions can be sub­se­quent trans­fers

of files or com­mu­ni­ca­tion us­ing odd ports.

Some sta­tis­ti­cal tech­niques are also avail­able to find the ex­pected num­ber of hosts of a tar­get net­work that would be ex­plored by a user, or to com­pute the like­li­hood of a fum­bling at­tack test that could ei­ther pass or fail.

Cap­tur­ing TCP flags

In a UNIX en­vi­ron­ment, a de facto packet-cap­tur­ing tool is tcp­dump. It is pow­er­ful as well as flex­i­ble. As a UNIX tool, a pow­er­ful shell script is also ap­pli­ca­ble over the out­puts of tcp­dump and can pro­duce a fil­tered re­port as de­sired. The un­der­ly­ing packet-cap­tur­ing tool of tcp­dump is lib­cap and it pro­vides the source, des­ti­na­tion, IP ad­dress, port and IP pro­to­col over the tar­get net­work in­ter­face for each net­work pro­to­col. For ex­am­ple, to cap­ture TCP SYN pack­ets over the eth0 in­ter­face, we can use the fol­low­ing com­mand:

$ tcp­dump –i eth0 “tcp[tcpflags] & (tcp-syn) !=0” –nn –v

Sim­i­larly, TCP ACK pack­ets can be cap­tured by is­su­ing the com­mand given be­low:

$tcp­dump –I eth0 “tcp[tcpflags] & (tcp-ack) != 0” –nn –v

To have a com­bined cap­ture re­port of SYN and ACK, both the flags can be com­bined as fol­lows:

$tcp­dump –I eth0 “tcp[tcpflags] & (tcp | tcp-ack) != 0” –nn –v

Get­ting net­work in­for­ma­tion

In this re­gard, net­stat is a use­ful tool to get net­work con­nec­tions, rout­ing ta­bles, in­ter­face statis­tics, mas­quer­ade con­nec­tions, and multi-cast mem­ber­ships. It pro­vides a de­tailed view of the net­work to di­ag­nose net­work prob­lems. In our case, we can use this to iden­tify ports that are lis­ten­ing. For ex­am­ple, to know about con­nec­tions of HTTP and HTTPS traf­fic over TCP, we can use the fol­low­ing com­mand ex­pres­sion with -l (to re­port socket), -p (to re­port rel­e­vant port) and –t (for only TCP) op­tions.

$ net­stat -tlp

Data anal­y­sis

Now, let’s dis­cuss a net­work data anal­y­sis ex­am­ple on net­stat com­mand out­comes. This will help you to un­der­stand the net­work traf­fic to carry out in­tru­sion de­tec­tion and preven­tion.

Let’s say we have a csv file from the net­stat com­mand, as shown be­low:

> rfi <- read.csv(“rficsv.csv”,header=TRUE, sep=”,”) …where the di­men­sion, col­umns and ob­ject class are: > dim(rfi) [1] 302 11

> names(rfi)

[1] “ccad­min” “pts.0” “X” “ipad­dress” “Mon” “Oct” [7] “X30” X17.25” “X.1” “still” “logged.in”

> class(rfi)

[1] “data.frame”

To make the rel­e­vant col­umn head­ing mean­ing­ful, the first and fourth col­umn head­ings are changed to:

> col­names(rfi)[1]=’user’

> col­names(rfi)[4]=’ipad­dress’

If we con­sider a few se­lec­tive col­umns of the data frame, as shown here:

> c = c(col­names(rfi)[1],col­names(rfi)[2],col­names(rfi) [4],col­names(rfi)[5],col­names(rfi)[6],col­names(rfi) [7],col­names(rfi)[8])

…then the first ten rows can be dis­played to have a view of the table struc­ture, as shown in Fig­ure 1. > x = rfi[, c,drop=F]

> head(x,10) user pts.0 ipad­dress Mon Oct X30 X17.25 1 root pts/1 172.16.7.226 Mon Oct 30 12:48 2 ccad­min pts/0 172.16.5.230 Mon Oct 30 12:30 3 ccad­min pts/0 172.16.5.230 Wed Oct 25 10:22 4 root pts/1 172.16.7.226 Tue Oct 24 11:54 5 ccad­min pts/0 172.16.5.230 Tue Oct 24 11:53 6 (un­known :0 :0 Thu Oct 12 12:57 7 root pts/0 :0 Thu Oct 12 12:57

8 root :0 :0 Wed Oct 11 12:56 9 (un­known :0 :0 Wed Oct 11 12:55 10 re­boot sys­tem 3.10.0-123.el7.x Thu Oct 12 12:37

The data shows that the data frame is not in a uni­form table for­mat and fields of records are sep­a­rated by a tab char­ac­ter. This re­quires some amount of fil­ter­ing of data in the table to ex­tract rel­e­vant rows for fur­ther pro­cess­ing. Since I will be demon­strat­ing the dis­tri­bu­tion of IP ad­dresses within a sys­tem, only the IP ad­dress and other re­lated fields are kept for his­togram plot­ting.

To have a sta­tis­ti­cal eval­u­a­tion of this data, it is worth re­mov­ing all the irrelevant fields from the data frame:

drops = c(col­names(rfi)[2],col­names(rfi)[3],col­names(rfi) [5],col­names(rfi)[6],col­names(rfi)[7],col­names(rfi) [8],col­names(rfi)[9],col­names(rfi)[10],col­names(rfi)[11]) d = rfi[ , !(names(rfi) %in% drops)]

Then, for sim­plic­ity, ex­tract all IP ad­dresses at­tached to the user ‘ccad­min’ which start with ‘172’.

u = d[like(d$user,’ccad­min’) & like(d$ipad­dress,’172’),]

Now the data is ready for anal­y­sis. The R sum­mary com­mand will show the count of el­e­ments of each field, whereas the count com­mand will show the fre­quency dis­tri­bu­tion of the IP ad­dress as shown be­low:

> sum­mary(u) user ipad­dress ccad­min :34 172.16.6.252:21

:0 172.16.7.155: 6 (un­known : 0 172.16.5.230: 3 back­up_u : 0 172.16.11.95: 2 re­boot :0 172.16.4.66 : 1 root :0 172.16.5.132: 1 (Other) :0 (Other) :0 and > count(u)

user ipad­dress freq 1 ccad­min 172.16.11.95 2 2 ccad­min 172.16.4.66 1 3 ccad­min 172.16.5.132 1 4 ccad­min 172.16.5.230 3 5 ccad­min 172.16.6.252 21 6 ccad­min 172.16.7.155 6

For bet­ter vi­su­al­i­sa­tion, this fre­quency dis­tri­bu­tion of the IP ad­dress can be de­picted us­ing a his­togram, as fol­lows:

qplot(u$ipad­dress,main=’IP his­togram’,xlab=’BioMass of Leaves’,ylab=’Fre­quency’)

Fig­ure 1: His­togram of IP ad­dresses of net­stat

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.