You're replying to a comment by atehwa.

October 14, 2014, 13:03

In the section "Check whether two files contain the same data", there is the claim that the awk solution is O(n). However, this is dependent on the associative array entry creation and lookup being amortised O(1). Surprisingly many utilities don't use such associative arrays but instead have O(n) behaviour for sufficiently large data. This can be caused by e.g. static hashtables or hashtables that resize by a constant amount.

If the hashtable implementation _is_ good, then the solution's linearity depends on loading all data into memory. If insufficient memory is available, sort is often a faster alternative. However, this depends on many factors.

Reply To This Comment

(why do I need your e-mail?)

(Your twitter handle, if you have one.)

Type the word "sandbox_100": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.