You're replying to a comment by AnonymousCoward.

AnonymousCoward Permalink
November 26, 2009, 13:16

Good work, but there's some gotcha's that needs to be mentioned just in case someone might be interested in awk networking.

1) Although the awk itself is portable, the underlying environment is not.
In case you have a UTF-8 enabled environment,
do not expect the awk-youtube downloader (or any other awk script that handles binary file) to work.
This is due to utf normlization and such, but also because the length() function in awk has a bug. Patch has been submitted upstream, so it should work on later versions of awk.
If you are using length($0 + RT) to get the number of bytes read, do not expect it to get the correct value.
Temporary workaround is to set LC_ALL and LANG variable to "C", and then run the awk script.
Note that some versions of awk ignore this in order the handle multibyte characters, so this is not a fix, but rather a workaround that may or may not work. YMMV.

The gnu awk bit manipulation function, such as
xor, and, not, conpl, rshift, lshift, are environmental specific. Known bug, and patch has been sent upstream. Always check 'version' on the
procinfo env var before using. There are patch scripts floating on the net for awk platform without bit manipulation (that does not depend on underlying platform, but is rather slow) in case you absolutely need it.

Example workaround script that one should put before running awk. Note this is only a workaround hack, and is very _different_ from when one uses the 'sort' command. This really is a length() function bug in awk, and should be fixed.

LANG=C
LC_ALL=C

awk --re-interval '{ 
 # Your code here
}'

Happy hacking, Peteris!

From the AnonymousCoward

Reply To This Comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the word "0day": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.