You're viewing a comment by Jimmy and its responses.

Jimmy Permalink
August 18, 2010, 01:34

Hi Wondering If I could have some help - sorry for double post

I have a column of numbers sorted in ascending order. I am trying to remove the last 5% of the records then count avg sum etc

The total records are : 99183
I only want to sum the first : 94222 (discarding the outliers 5%)

Where the value of 94222 is in the command line is where I want to use the variable NFIVE -

but if I put variable in awk counts all the records and sums no records.

Desired output:

cat <file> |tr -s '=' ' '|sort -k5n | awk '{NFIVE=NR*.95}; {if (NR<94222) TOTAL+=$5} END{printf("COUNT:%d, TOTAL:%d,MEAN:%d\n",NFIVE,TOTAL,TOTAL/NFIVE)}'

OUTPUT: (and correct values)
COUNT:94222, TOTAL:19079403, MEAN:202

Incorrect values I get if using NFIVE

cat <FILE> |tr -s '=' ' '|sort -k5n | awk '{NFIVE=NR*.95}; {if (NR<NFIVE) TOTAL+=$5} END{printf("COUNT:%d, TOTAL:%d, MEAN:%d\n",NFIVE,TOTAL,TOTAL/NFIVE)}'

COUNT:94222, TOTAL:0, MEAN:0

Thanks for any assitance

Reply To This Comment

(why do I need your e-mail?)

(Your twitter handle, if you have one.)

Type the word "cloud_92": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.