• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Treeset vs Arraylist vs PriorityQueue

 
Greenhorn
Posts: 24
1
  • Number of slices to send:
    Optional 'thank-you' note:
I'm trying to compare performance of `PriorityQueue`, `ArrayList` and `TreeSet` for these sequence of operations:
- insert descending `n` integers into the structure
- get the minimum value after all integers have been inserted into the structure





my code looks like so:


my instinct tells me that `ArrayList` should be the slowest since it's using array sorting and it would involve shifting elements in array, which we know should be very slow. both `PriorityQueue` and `TreeSet` should only involve swapping array element, and since `PriorityQueue` doesnt really sort, it should have slight advantage to `TreeSet`. but the result is quite surprising.
the last 3 results are

   pq duration 144 ms
   list duration 22 ms
   set duration 176 ms

   pq duration 154 ms
   list duration 50 ms
   set duration 124 ms

   pq duration 141 ms
   list duration 22 ms
   set duration 148 ms

so, there are 2 surprises here:
- ArrayList is ALWAYS faster quite significantly than both
- at first, `PriorityQueue` wins, but starting 1M elements, `TreeSet` is faster

Can anybody help me understand these 2 surprises?

Thanks

 
Marshal
Posts: 28258
95
  • Number of slices to send:
    Optional 'thank-you' note:

Mark Sando wrote:my instinct tells me that `ArrayList` should be the slowest since it's using array sorting and it would involve shifting elements in array, which we know should be very slow. both `PriorityQueue` and `TreeSet` should only involve swapping array element



I would have expected sorting the elements of an array to be done via repeatedly swapping elements of the array. But it seems that you know that "shifting elements" is used instead. (I'm not sure what that means.) So I'm not at all surprised that sorting an ArrayList is faster than those other data structures.

And I would have expected a TreeSet to be based on a map and not an array, so I wouldn't have expected that sorting a TreeSet would involve swapping elements of an array. As for comparing PriorityQueue and TreeSet, I don't have any intuition about those.
 
Paul Clapham
Marshal
Posts: 28258
95
  • Number of slices to send:
    Optional 'thank-you' note:
Also, welcome to the Ranch!
 
Marshal
Posts: 79392
377
  • Number of slices to send:
    Optional 'thank-you' note:
Paul, I would have thought that a tree set is based on a binary search tree, not a hash table or similar. That is how a hash set is implemented. TreeSet is based on TreeMap, which uses a red‑black tree. It doesn't use an array.

Welcome to the Ranch

Mark Sando wrote:. . . `ArrayList` should be the slowest since it's using array sorting . . .

What makes you think that. It is not necessary to sort the List. As Paul said, swapping elements in arrays is fast. We are not familiar with any means of shifting elements, only swapping. Adding elements to an array, or finding those elements by index, runs in constant time (for an ArrayList, amortised constant time). You appear to be adding elements, but I can't see where you are finding the minimum. I would suggest you don't use the milliseconds method, but this instead. I also suggest your testing run dummy runs before you start timing so as to allow any optimisations in the JVM to take full effect. You will see that the timings vary greatly between subsequent runs (that is normal when you do this sort of exercise), but the ordering of timings seems to be constant.
If you add elements to a simple binary search tree, adding will run in logarithmic time if they are in random order, but if they are added already sorted, the tree will add all elements either to the right or to the left, and it will degenerate to being a linked list. Adding then degenerates to linear time. A red‑black tree sorts out that problem as it goes, but it takes time to re‑balance the tree. I don't know the exact implementation of PriorityQueue, but it says that the adding operations run in logarithmic time. Similarly adding to a TreeSet (link above) runs in logarithmic time.
Searching those collections for a minimum value would run in constant time for PriorityQueue, in logarithmic time for a tree set, and in linear time for a list because you would have to traverse the List. So I should expect that an array list would give the fastest performance, and also, is seems that the information was all available in the API documentation.
 
Campbell Ritchie
Marshal
Posts: 79392
377
  • Number of slices to send:
    Optional 'thank-you' note:
The List#sort() method tells us it uses a kind of merge sort and that runs in nlog(n) time. It may run in much faster time if you present it with a sorted List. Even sorted in reverse order.
 
Mark Sando
Greenhorn
Posts: 24
1
  • Number of slices to send:
    Optional 'thank-you' note:
Thank you all for the warm welcome!

ohh, I see...I thought that Timsort (like Selection sort) involves shifting elements around like this:
consider the original state of a list : [2,3,4,1]
to sort this list:
* I'll put 1 in temp var
* 4 is now shifted right from position 3 to occupy 4th position -> [2,3,4,4]
* shift 3 from pos 2 to 3 -> [2,3,3,4]
* shift 2 from pos 1 to 2 -> [2,2,3,4]
* and finally  put 1 in the first slot [1,2,3,4]

so, it seems that Timsort only swaps elements around?

if it is so, then the explanation of this phenomenon is that priorityqueue and treeset swap elements around waaay more times than arraylist? so, araylist very rarely swap elements compared to priorityqueue and treeset is that all there is to it to explain this?
I modified the code to show how to get the minimum value. basically, it's just accessing the first / head element after done sorting the structure.
 
Campbell Ritchie
Marshal
Posts: 79392
377
  • Number of slices to send:
    Optional 'thank-you' note:

Mark Sando wrote:. . . priorityqueue and treeset swap elements around waaay more times than arraylist? . . .

No. You are handling an already sorted List; its behaviour on sorting is different from an unsorted List.

Besides, the different types of collection are not there to supply different performances; they are there to supply different functionality. It is a little like trying to find out whether a screwdriver or a hammer is faster at putting screws into wood. Would it be different with nails? Don't use collections because they are fastest to sort. Use them because they do what you need.
 
Campbell Ritchie
Marshal
Posts: 79392
377
  • 1
  • Number of slices to send:
    Optional 'thank-you' note:
Please don't edit posts after they have been replied to; that can make the answers confusing. Please post any changes in a new post. I am reverting the edits.
 
Paul Clapham
Marshal
Posts: 28258
95
  • Number of slices to send:
    Optional 'thank-you' note:

Mark Sando wrote:if it is so, then the explanation of this phenomenon is that priorityqueue and treeset swap elements around waaay more times than arraylist? so, araylist very rarely swap elements compared to priorityqueue and treeset is that all there is to it to explain this?



Not really, no. Notice that your code doesn't actually sort the PriorityQueue and TreeSet because they don't have sort() methods. This is because sorting a TreeSet, at least, doesn't mean anything. You don't access a TreeSet by entry number because that isn't a meaningful concept. Your code adds Integer objects to a TreeSet, which simply builds a more and more complex tree of (in your case) Integer objects. No swapping is involved because there's no underlying array. I expect that mostly applies to PriorityQueue too.
 
Mark Sando
Greenhorn
Posts: 24
1
  • Number of slices to send:
    Optional 'thank-you' note:
oh, sorry, let me rephrase that. so, the 3 structures achieve the same effect -> accept a bunch of integers and find the k smallest integers among them. and I'm comparing among these 3 approaches which one is faster. this is the underlying objective of this experiment. so, even when priorityqueue and treeset dont have explicit sort method, but they automatically "sort" incoming elements.
 
Campbell Ritchie
Marshal
Posts: 79392
377
  • Number of slices to send:
    Optional 'thank-you' note:

Mark Sando wrote:. . . the 3 structures achieve the same effect . . .

Only if you torture them a bit. You didn't say anything about lowest k elements yesterday.

. . . they automatically "sort" incoming elements.

Only a tree set sorts incoming elements; if you compare the results of using the Iterator on a priority queue to the results from poll(), you will find they are different. As I said, I don't know how priority queues are implemented, but any sorting is done later. Lists maintain insertion order if you use the ordinary add(E) method. The three data structures have different functions. I worry a bit about people being taught about fast execution; speed should be one of the lowest priorities when you program anything. Of course, it is quite all right to test things and see what happens if . . . In fact I think, “I only want to see what happens if . . . ,” is a very good way to learn.
As I said last night, you are supplying your input already sorted, albeit backwards. I suggest you try the following to get the lowest k numbers out of your List:-
  • 1: Populate the List in (backwards) sorted fashion: no change from what you are doing now.
  • 2: Use the subList() method to take the last k elements as a List.
  • 3: Copy that List into a new object (=new ArrayList(myList.siblist(123, 456))) You would probably use arguments like myList.size() − 1. I shall let you get the arithmetic right
  • 3½: Teach me how to spelll.
  • 4: Use Collections.reverse().

  • Another thing about arrays: to insert an element, it is only necessary to assign it; that is why the ArrayList documentation says,

    . . . operations run in constant time. The add operation runs in amortized constant time . . . The constant factor is low compared to that for the LinkedList implementation. . . .

    In the case of a linked list, adding an element entails creating a node object, and the few nanoseconds that takes is what slows it down. Maybe a priority queue creates something analogous to nodes; a tree set will almost certainly have nodes too. So that will slow down both those implementations.
     
    Campbell Ritchie
    Marshal
    Posts: 79392
    377
    • 1
    • Number of slices to send:
      Optional 'thank-you' note:

    Paul Clapham wrote:. . . a TreeSet . . . builds a more and more complex tree . . . No swapping is involved . . .

    OP is adding the numbers already sorted; that means the tree, being a red‑black tree, will have to rebalance itself to avoid degenerating to a linked list as a plain simple binary tree would. Don't know whether that slows the implementation down; I think creating multiple node objects might be the deciding factor.

    By the way: I suspect that performance would be different if the numbers were supplied at random.
     
    Marshal
    Posts: 8880
    638
    • Number of slices to send:
      Optional 'thank-you' note:
    Mark Sando, cowgratulations, your topic has been published in our CodeRanch Journal.
     
    Greenhorn
    Posts: 1
    • Number of slices to send:
      Optional 'thank-you' note:
    @Mark Sando

    In the scenario you provided the ArrayList should always be the fastest since it is sorted only once after the entire list of n=1,000,000 number has been created,  while the other structures maintain some kind of ordering information at all times.

    If we change the scenario to get the minimal element after each update to the list, then you would have to sort ArrayList n times (once after each element is added to the ArrayList). The other ordered structures would continue to work as they are without any addition sorting required. In this kind of scenario where you need to maintain an ordered list at all times ArrayList will most certainly be a lot slower.





     
    Campbell Ritchie
    Marshal
    Posts: 79392
    377
    • Number of slices to send:
      Optional 'thank-you' note:
    Welcome to the Ranch

    You may have a long wait for a reply from MS; he has been inactive for several months.
    You would have to test the performance, but even when you have worked out which data structure is fastest, please note what I said earlier:-
  • 1: The different data structures have different purposes, and it is a bad idea to compare them by performance.
  • 2: In the case of the array list, it is unnecessary to sort it to find the largest or smallest elements, but you have to do a linear search.
  • Remember that adding to an array list runs in amortised constant time, so the performances will differ depending on how often you do the search. I think the most important thing is to remember the different purposes of those data structures and going for speed of performance can simply cause you problems.
     
    Master Rancher
    Posts: 4905
    74
    • Number of slices to send:
      Optional 'thank-you' note:
    I think Axel did correctly answer why, in the original post, the ArrayList appeared to be fastest, given the way that test was coded.  That specific point had not been made.  But the additional caveats provided by Campbell and others are still valid.
     
    Don't get me started about those stupid light bulbs.
    reply
      Bookmark Topic Watch Topic
    • New Topic
    vceplus-200-125    | boson-200-125    | training-cissp    | actualtests-cissp    | techexams-cissp    | gratisexams-300-075    | pearsonitcertification-210-260    | examsboost-210-260    | examsforall-210-260    | dumps4free-210-260    | reddit-210-260    | cisexams-352-001    | itexamfox-352-001    | passguaranteed-352-001    | passeasily-352-001    | freeccnastudyguide-200-120    | gocertify-200-120    | passcerty-200-120    | certifyguide-70-980    | dumpscollection-70-980    | examcollection-70-534    | cbtnuggets-210-065    | examfiles-400-051    | passitdump-400-051    | pearsonitcertification-70-462    | anderseide-70-347    | thomas-70-533    | research-1V0-605    | topix-102-400    | certdepot-EX200    | pearsonit-640-916    | itproguru-70-533    | reddit-100-105    | channel9-70-346    | anderseide-70-346    | theiia-IIA-CIA-PART3    | certificationHP-hp0-s41    | pearsonitcertification-640-916    | anderMicrosoft-70-534    | cathMicrosoft-70-462    | examcollection-cca-500    | techexams-gcih    | mslearn-70-346    | measureup-70-486    | pass4sure-hp0-s41    | iiba-640-916    | itsecurity-sscp    | cbtnuggets-300-320    | blogged-70-486    | pass4sure-IIA-CIA-PART1    | cbtnuggets-100-101    | developerhandbook-70-486    | lpicisco-101    | mylearn-1V0-605    | tomsitpro-cism    | gnosis-101    | channel9Mic-70-534    | ipass-IIA-CIA-PART1    | forcerts-70-417    | tests-sy0-401    | ipasstheciaexam-IIA-CIA-PART3    | mostcisco-300-135    | buildazure-70-533    | cloudera-cca-500    | pdf4cert-2v0-621    | f5cisco-101    | gocertify-1z0-062    | quora-640-916    | micrcosoft-70-480    | brain2pass-70-417    | examcompass-sy0-401    | global-EX200    | iassc-ICGB    | vceplus-300-115    | quizlet-810-403    | cbtnuggets-70-697    | educationOracle-1Z0-434    | channel9-70-534    | officialcerts-400-051    | examsboost-IIA-CIA-PART1    | networktut-300-135    | teststarter-300-206    | pluralsight-70-486    | coding-70-486    | freeccna-100-101    | digitaltut-300-101    | iiba-CBAP    | virtuallymikebrown-640-916    | isaca-cism    | whizlabs-pmp    | techexams-70-980    | ciscopress-300-115    | techtarget-cism    | pearsonitcertification-300-070    | testking-2v0-621    | isacaNew-cism    | simplilearn-pmi-rmp    | simplilearn-pmp    | educationOracle-1z0-809    | education-1z0-809    | teachertube-1Z0-434    | villanovau-CBAP    | quora-300-206    | certifyguide-300-208    | cbtnuggets-100-105    | flydumps-70-417    | gratisexams-1V0-605    | ituonline-1z0-062    | techexams-cas-002    | simplilearn-70-534    | pluralsight-70-697    | theiia-IIA-CIA-PART1    | itexamtips-400-051    | pearsonitcertification-EX200    | pluralsight-70-480    | learn-hp0-s42    | giac-gpen    | mindhub-102-400    | coursesmsu-CBAP    | examsforall-2v0-621    | developerhandbook-70-487    | root-EX200    | coderanch-1z0-809    | getfreedumps-1z0-062    | comptia-cas-002    | quora-1z0-809    | boson-300-135    | killtest-2v0-621    | learncia-IIA-CIA-PART3    | computer-gcih    | universitycloudera-cca-500    | itexamrun-70-410    | certificationHPv2-hp0-s41    | certskills-100-105    | skipitnow-70-417    | gocertify-sy0-401    | prep4sure-70-417    | simplilearn-cisa    |
    http://www.pmsas.pr.gov.br/wp-content/    | http://www.pmsas.pr.gov.br/wp-content/    |