MetaChat REGISTER   ||   LOGIN   ||   IMAGES ARE OFF   ||   RECENT COMMENTS




artphoto by splunge
artphoto by TheophileEscargot
artphoto by Kronos_to_Earth
artphoto by ethylene

Home

About

Search

Archives

Mecha Wiki

Metachat Eye

Emcee

IRC Channels

IRC FAQ


 RSS


Comment Feed:

RSS

11 February 2006

AskMeCha - help! I need to make logs understandable, i need purdy graphs. I'm too damn tired to think straight.[More:] Ho-kay, I got me a biiiig ol' logfile that saves lines info like this:

66.68.119.88 - cpe-66-68-119-88.austin.res.rr.com - 2006-01-21 (15:00) - 1137852019 - question6 - 2 - http://wklondon.typepad.com/ WK London - http://ad-rag.com/battleoftheadblogs/question6.php - Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) Safari/417.8

"Question6" reveals that this is from some people voting on stuff. Anyway, now, I need to find out if the same IP (person) voted more than once on the same question, so I'd like to sort this in some graph or ...something. And my brain is completly fried. Any wise people out there able to help me? For example, all lines that start with "unkown" (instead of an IP) I'd like to see in one go, who did they vote for? etc. I don't even know where to begin.
(oh and I've tried awstats, the apache log analyzer and just plain old reading the damn thing. Not really helping me see the big picture. There's accusations of cheating and I'd like to make sure that none was going on.)
posted by dabitch 11 February | 19:52
Know any perl?

Use a bit of perl to snarf the log lines into a mysql database. If these are apache logs, there's prolly a CPAN module that will do the parsing for you.

Alternatively, you might be able to pull these right into an Access or SQL Server DB by splitting on the " - " delimiter...

Once you've got your stuff in a DB, query away to your heart's content.
posted by killdevil 11 February | 19:56
you want to use the linux command line utilities sort -- which will sort the log file, cut, which will isolate just those columns you care about, and uniq, which will reduce to one line repeating lines, and uniq -c which will do what uniq does and also add a count of the number of multiple lines.

I do exactly that to see how many unique IPs are using my FF extensions.
posted by orthogonality 11 February | 19:57
whoha, my brain is fried. Did I mention that? The mysql query route might have been it, but, brain fried. (Buncha kids had a party here today. Me, not used to being mum pushed very hard as I made sure small and big kids didn't hurt themselves or the babies, my god bloody suicidal crazy little terrors! Someone please put me in a bed now.)

Ok, I'll explain my dumb problem again, you seem so much smarter than me. ;) I had 16 different questions where people could vote - one ip# could only vote once. Now, people are saying that one ip# could vote unlimited times, so basically I'd like to quickly find out if all vote on question5 came from different IP#, or just see if an ip# vote a whole bunch of times on one question. 'cause that would be wrong. ... yeah I guess I could query the mysql server up and down and around again, but I'd really like some pretty graphs that just looked at the whole file for me....Justto get the big picture. All votes to "WK London" came from these ip's.. Like. Hmm.
posted by dabitch 11 February | 20:16
ortho is basically right here. assuming the sample line above only includes the vote data i would use soemthing like this:
cat nameoflogfile | cut -d" " -f1 | sort | uniq -c

this will give you each ip address followed by the number of occurences

if the above doesn't exclude the rest the log file then i'd probably use gawk/awk instead of cat to feed the above command line. i'm currently too fucked to be helpful on that front.

ps: i'm very drunk so there are probably large parts of this that are very very wrong
posted by dodgygeezer 11 February | 20:41
The problem is that it looks like your log file has a multi-character delimiter, namely " - ", which sort can't handle. I think you're looking for something like:

perl -ple '$_ = join("\t", (split(/ - /))[6,0])' data.txt | sort | uniq -c

where "6,0" is a list of the columns you're interested in, starting from zero. For graphing the results, Excel's probably the simplest route.

By the way, there are many reasons why multiple votes might come from the same IP address, including libraries, proxies, and anti-virus software.
posted by eamondaly 11 February | 20:55
eamon, she can use cut to cut out the fileds she doesn't care about, using -d " " as the delim
posted by orthogonality 11 February | 22:12
It's official, a drunk dodgy is more awake than a pooped-out-after-kid-party Dabitch. Maybe one should label kids with "do not operate heavy machinery after handling"? I reckon eamondaly makes sense too. :) Anyway, thanks so much, now that I've had some sleep and some coffe I can get to work! *mwah mwah*! Bless you all for your help. Seriously. I was so brainfried last night it wasn't even funny. killdevil asks "know any perl" and I'm ashamed to admit I even have a bit of a script tattooed on my arm, in perl. That's how fried my brain was, I didn't even think of using it.
By the way, there are many reasons why multiple votes might come from the same IP address, including libraries, proxies, and anti-virus software.

True, kids at small ad agencies were unhappy that their 12 machines had the same IP so they could only vote once, it's a crude way of trying to keep people from cheating, but easy.
posted by dabitch 12 February | 07:44
heh, with cat log.txt | awk '{print $1 "\t " $10}'|grep question |sort -n | uniq -c |sort -nr | head -40 I disovered many many cheaters. Tssssk.
posted by dabitch 12 February | 10:15
thanks for all your help guys. Really. Super!
posted by dabitch 12 February | 10:16
cool. glad to help.
posted by dodgygeezer 12 February | 11:07
Some cromulent words || This song deserves a post all it's own,

HOME  ||   REGISTER  ||   LOGIN