You're viewing a comment by John and its responses.

John Permalink
December 03, 2008, 17:49

Hi Peter,

Nice article. I enjoy your work.

I noticed a problem with one of your published methods for set intersection

sort A B | uniq -d 

will report items that are duplicated in either set A or set B as intersections. So one has to make sure A and B are uniq'd first which seems to make the command a bit more complicated and it will run a lot slower.

sort <(sort A | uniq) <(sort B | uniq) | uniq -d

Perhaps you might see a way to do this more efficiently.

Also, the join and comm methods may not work reliably on large data sets. I get no intersection on my Abig and Bbig sets. The nifty grep method seems to work the best.

Reply To This Comment

(why do I need your e-mail?)

(Your twitter handle, if you have one.)

Type the word "sandbox_107": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.