It is currently 11 Sep 2025, 23:28
   
Text Size

Please stop using HashSet, HashMap, and Hashtable

Post MTG Forge Related Programming Questions Here

Moderators: timmermac, Blacksmith, KrazyTheFox, Agetian, friarsol, CCGHQ Admins

Please stop using HashSet, HashMap, and Hashtable

Postby Braids » 11 Jul 2011, 04:41

maybe people have been following my work and don't need to be reminded, but i think it's important for me to make it clear what's going on and how other coders can help.

forge uses too much heap. i've reduced a fair portion of that in my efforts to purge use of forge.card.cardFactory.getAllCards and .getCards. i am now working on another heap waster: hashes. trees are more heap efficient and still have spiffy performance.

i am going through the code now, converting variable, field, parameter, and return types from Hashtable, HashMap, and HashSet to Map, Map, and Set, respectively. i am also converting instances to use TreeMap, TreeMap, and TreeSet implementations where feasible.

feasible? yes. there is one important subtle fact about trees. the keys of a TreeMap and the items in a TreeSet must have a "natural ordering". this is java's fancy way of saying that these things need to implement Comparable<T>. if your key (or set's item) type is forge.Card, String, an enum type, or any of Java's primitive or numeric classes like Integer, you're fine. if not, you need to make sure the key (or set's item) type implements Comparable correctly. you can write your own. for an example of a homemade Comparable, see forge.Card.compareTo. if you aren't confident enough for that, use a generic Map or Tree-typed variable/field/parameter/return type, but use {new HashTree} or {new HashMap} for the actual value.

here are some concrete examples.

Code: Select all
//No:
HashMap<String,String> variableName = new HashMap<String,String>();
HashSet<String> variableName2 = new HashSet<String>();
//Instead:
Map<String,String> variableName = new TreeMap<String,String>();
Set<String> variableName2 = new TreeSet<String>();

//No:
public HashMap<Object,String> methodName()
public HashSet<Object> methodName2()
//Instead:
public Map<Object,String> methodName()
public Set<Object> methodName2()

//No:
public void methodName3(HashMap<Object,String> map)
public void methodName4(HashSet<Object> set)
//Instead:
public void methodName3(Map<Object,String> map)
public void methodName4(Set<Object> set)

//No:
public MyClass {
    private HashMap<String,String> fieldName;
    private HashSet<String> fieldName2;
    ...
        fieldName = new HashMap<String,String>();
        fieldName2 = new HashSet<String>();
    ...
}
//Instead:
public MyClass {
    private Map<String,String> fieldName;
    private Set<String> fieldName2;
    ...
        fieldName = new TreeMap<String,String>();
        fieldName2 = new TreeSet<String>();
    ...
}
now, if your key or set item type's class isn't Comparable, you can use these for the field and variable declarations

Code: Select all
//No:
HashMap<Incomparable,String> variableName = new HashMap<Incomparable,String>();
HashSet<Incomparable> variableName2 = new HashSet<Incomparable>();
//Instead:
Map<Incomparable,String> variableName = new HashMap<Incomparable,String>();
Set<Incomparable> variableName2 = new HashSet<Incomparable>();

//No:
public MyClass {
    private HashMap<Incomparable,String> fieldName;
    private HashSet<Incomparable> fieldName2;
    ...
        fieldName = new HashMap<Incomparable,String>();
        fieldName2 = new HashSet<Incomparable>();
    ...
}
//Instead:
public MyClass {
    private Map<Incomparable,String> fieldName;
    private Set<Incomparable> fieldName2;
    ...
        fieldName = new HashMap<Incomparable,String>();
        fieldName2 = new HashSet<Incomparable>();
    ...
}
if you have questions, comments, or concerns, please feel free to post a reply. thanks.
"That is the dumbest thing I've ever seen." --Rob Cashwalker, regarding Innistrad double-sided cards. One of the first times he and I have ever agreed on something. ;)
User avatar
Braids
Programmer
 
Posts: 556
Joined: 22 Jun 2011, 00:39
Location: Unknown. Hobby: Driving myself and others to constructive madness.
Has thanked: 1 time
Been thanked: 1 time

Re: Please stop using HashSet, HashMap, and Hashtable

Postby Chris H. » 11 Jul 2011, 11:54

Braids wrote:if you have questions, comments, or concerns, please feel free to post a reply. thanks.
`
A lot of this stuff is over my head so I will just give you a quick thank you for your efforts. :)
User avatar
Chris H.
Forge Moderator
 
Posts: 6320
Joined: 04 Nov 2008, 12:11
Location: Mac OS X Yosemite
Has thanked: 644 times
Been thanked: 643 times

Re: Please stop using HashSet, HashMap, and Hashtable

Postby Rob Cashwalker » 11 Jul 2011, 14:04

So, basically you're telling us to ditch the Hash, and switch to Trees...? :rolleyes:
The Force will be with you, Always.
User avatar
Rob Cashwalker
Programmer
 
Posts: 2167
Joined: 09 Sep 2008, 15:09
Location: New York
Has thanked: 5 times
Been thanked: 40 times

Re: Please stop using HashSet, HashMap, and Hashtable

Postby jeffwadsworth » 11 Jul 2011, 16:30

jeffwadsworth
Super Tester Elite
 
Posts: 1172
Joined: 20 Oct 2010, 04:47
Location: USA
Has thanked: 287 times
Been thanked: 70 times

Re: Please stop using HashSet, HashMap, and Hashtable

Postby Braids » 11 Jul 2011, 22:28

jeffwadsworth has rightly placed me in a position of reevaluating this situation. thank you for your post, jeff.

first, the punchline.

Rob Cashwalker wrote:So, basically you're telling us to ditch the Hash, and switch to Trees...? :rolleyes:
i'm not sure yet. in the meantime, i still think it is smart to use generic Set or Tree interfaces for any new variable, field, parameter, and return types. that allows us to change the implementation to hash or tree on a case by case basis, without touching more than one file. some examples.

Code: Select all
// use Map instead of Hashmap here:
Map<W,X> methodName(Map<Y,Z> parameterName); 

// use generic interface type for variable or field, but
// (for now) still use new Hash... for actual contents:
Map<X,Y> variableOrFieldName = new HashMap<X,Y>)(); 
jeffwadsworth wrote:http://download.oracle.com/javase/tutorial/collections/implementations/set.html

Interesting discussion here also.

http://stackoverflow.com/questions/1463 ... vs-treeset
:-k thank you, jeff. that was very tasteful and enlightening. the case for hashes over trees is convincing.

i am now in a state of uncertainty.

however, i know for a fact that after i switched CardFactory from a HashMap to a TreeMap, its heap use changed by quite a bit. i don't have the hprof files around anymore to prove it exactly. i can attest that with the default maximum heap size, the program ran longer before it got an OutOfMemoryError, and CardFactory had a smaller percentage of the total heap allocation at the time it finally ran out of memory.

trees are slower. i thought the performance was good, but logarithmic time {O(log2(n))} is considerably slower than constant time {O(1)}. but that starts to become irrelevant if the Java VM forces forge to use virtual memory instead of physical memory, as timmermac's experiences seem to indicate on timmermac's Windows OS.

one of the authors of the above topics recommended that hashes be twice as big as they need to be. that's a lot of wasted space, no?

how much CPU performance are we willing to sacrifice to save on heap space? should we skip this possibile misstep and go straight for finding memory leaks? what about making the game state smaller for minimax?

here is the only answer i can come up with right now. i have to gather more factual data about heap use in recent revs of forge and share my findings.

in the meantime, as mentioned above, i still think it is smart to use generic Set or Tree interfaces for any new variable, field, parameter, and return types. that allows us to change the implementation to hash or tree on a case by case basis, without touching more than one file.

further comments are much appreciated. thank you.
"That is the dumbest thing I've ever seen." --Rob Cashwalker, regarding Innistrad double-sided cards. One of the first times he and I have ever agreed on something. ;)
User avatar
Braids
Programmer
 
Posts: 556
Joined: 22 Jun 2011, 00:39
Location: Unknown. Hobby: Driving myself and others to constructive madness.
Has thanked: 1 time
Been thanked: 1 time

Amazingly reduced heap usage at r10644!

Postby Braids » 11 Jul 2011, 23:24

well, at r10644, i ran the deck editor and played a few turns of a randomly generated constructed game with -Xmx175M. yes, just three digits. it was playable. i couldn't get it to die with an OutOfMemoryError until i lowered it to -Xmx100M. in this rev, both CardFactory and Card are using trees instead of hashes.
"That is the dumbest thing I've ever seen." --Rob Cashwalker, regarding Innistrad double-sided cards. One of the first times he and I have ever agreed on something. ;)
User avatar
Braids
Programmer
 
Posts: 556
Joined: 22 Jun 2011, 00:39
Location: Unknown. Hobby: Driving myself and others to constructive madness.
Has thanked: 1 time
Been thanked: 1 time

Re: Please stop using HashSet, HashMap, and Hashtable

Postby friarsol » 12 Jul 2011, 00:18

Maybe we should be creating Hashes with a certain length, this way they aren't creating fat structures that take up the whole heap, but are still fast for accessing. For AF and company we know what the size of each Hash is going to be on creation and we never add to them, making them more suitable than Trees.

Actually, I'm not sure if any of our structures have elements added to them after the initial construction. I don't think the variables themselves need to be set as just Set or Map, just the function definitions, so if the variable is changed at some point the generic can still handle it.

Are we sure that trees take up less space than Hashes?

3 Nodes == 2 Leaves * 2 = 4 (+3) = 7
7 Nodes == 4 Leaves * 2 = 8 (+7) = 15
15 Nodes == 8 Leaves * 2 = 16 (+15) = 31

Where on a hash we have
3 Nodes (*2 Buffer) = 6
7 Nodes (*2 Buffer) = 14
15 Nodes (*2 Buffer) = 30
friarsol
Global Moderator
 
Posts: 7593
Joined: 15 May 2010, 04:20
Has thanked: 243 times
Been thanked: 965 times

Re: Please stop using HashSet, HashMap, and Hashtable

Postby Braids » 12 Jul 2011, 01:42

friarsol wrote:Are we sure that trees take up less space than Hashes?

3 Nodes == 2 Leaves * 2 = 4 (+3) = 7
7 Nodes == 4 Leaves * 2 = 8 (+7) = 15
15 Nodes == 8 Leaves * 2 = 16 (+15) = 31

Where on a hash we have
3 Nodes (*2 Buffer) = 6
7 Nodes (*2 Buffer) = 14
15 Nodes (*2 Buffer) = 30
your logic is impeccable. which leads me to wonder, "what was i thinking?"

i guess the recent reduction in forge's heap usage has more to do with my changes from getAllCards to Generators than with any Tree usage.

i'll have to think about it for another day or two, but i'll probably end up redacting the original post. thanks for using your brain.

edit 1: as shown.
Last edited by Braids on 12 Jul 2011, 23:20, edited 1 time in total.
"That is the dumbest thing I've ever seen." --Rob Cashwalker, regarding Innistrad double-sided cards. One of the first times he and I have ever agreed on something. ;)
User avatar
Braids
Programmer
 
Posts: 556
Joined: 22 Jun 2011, 00:39
Location: Unknown. Hobby: Driving myself and others to constructive madness.
Has thanked: 1 time
Been thanked: 1 time

Re: Please stop using HashSet, HashMap, and Hashtable

Postby friarsol » 12 Jul 2011, 02:10

Braids wrote:your logic is impeccable. which leads me to wonder, "what was i thinking?"

i guess the recent reduction in forge's heap usage has more to do with my changes from getAllCards to Generators than with any Tree usage.

i'll have to think about it for another day or two, but i'll probably end up redacting the original post. thanks for using your brain.
Have no doubt, there has been a ton of performance improvements lately and there is plenty of fat still to trim.

Since trees are easily the way to go for MinMax Game States that probably led you down the line of thinking "Let's have trees everywhere." We're all here to try to improve Forge, and it's a vast improvement since even 6 months ago.

What we should do is make sure we are assigning a size for each Hash to alleviate some potential performance hits (slow downs to resize, and possibly extra wasted space beyond the recommended doubled size).
friarsol
Global Moderator
 
Posts: 7593
Joined: 15 May 2010, 04:20
Has thanked: 243 times
Been thanked: 965 times

Re: Now that i've thought more about it and collected some d

Postby Braids » 12 Jul 2011, 23:16

friarsol wrote:Maybe we should be creating Hashes with a certain length, this way they aren't creating fat structures that take up the whole heap, but are still fast for accessing.
it won't help. according to Oracle, "Internally, the capacity [of a hash] is always rounded up to a power of two."
friarsol wrote:For AF and company we know what the size of each Hash is going to be on creation and we never add to them, making them more suitable than Trees.
actually, my research indicates the opposite. it is below.
friarsol wrote:Actually, I'm not sure if any of our structures have elements added to them after the initial construction.
you may be right for the most part; one counterexample is that counters on cards are implemented as Maps. these change during the game.
friarsol wrote:I don't think the variables themselves need to be set as just Set or Map, just the function definitions, so if the variable is changed at some point the generic can still handle it.
this is true; i just think it's a better practice to get in the habit of using the generic type. besides, i'm talking about new code here. i'm taking care of existing code (if at all).
friarsol wrote:Are we sure that trees take up less space than Hashes?

3 Nodes == 2 Leaves * 2 = 4 (+3) = 7
7 Nodes == 4 Leaves * 2 = 8 (+7) = 15
15 Nodes == 8 Leaves * 2 = 16 (+15) = 31

Where on a hash we have
3 Nodes (*2 Buffer) = 6
7 Nodes (*2 Buffer) = 14
15 Nodes (*2 Buffer) = 30
you're almost right. hashes usually have more overhead than this, because a hash's capacity (in Java) is always a power of two.

note i use ** {double star} to indicate exponentiation, as in Python.

let's take our maximum card pool as an example. currently we have 8584 cards.

this means the hash needs the optimal hash has a capacity of at least 2 * 8584 = 17168. "Internally, the capacity [of a hash] is always rounded up to a power of two." (Oracle) ceiling(log2(17168)) = 16. 2 to the 16th power = 2**16 = 32768 buckets (pointers) in the hash for this many cards. i'll assume that if if there is a hash collision, entries are placed in successive buckets.

in general, the optimal hash's heap footprint is approximately (2**ceiling(log2(2n)) pointers. this can range from 2n to around 4n pointers, depending on how close 2n is to a power of two. but 2n is the minimum for the optimal hash.

now about trees. the math here is even easier. in general, every time one adds a node to a balanced binary tree, the number of null pointers (used to indicate leaf nodes) increases by one. so, a balanced binary tree has 2n+1 pointers, n+1 of which are null.

a tree with 8584 nodes has 2*8584 + 1 = 17169 pointers.

enough with the theory. i have some evidence.

at src r10641+, where Card, CardFactory, and Combat were converted to use trees instead of hashes. TPTP heap analyzer reports 17,169 instances of Card took up 7,004,952 bytes. eclipse MAT postmortem shows CardFactory.map uses 32,220,888 units of Retained Heap.

at src r10619-, where i temporarily converted CardFactory back to using hashes instead of trees. TPTP heap analyzer reports 17,169 instances of Card took up 7,004,952 bytes, which is the same number. however, eclipse MAT postmortem shows CardFactory.map uses 35,101,984 units of Retained Heap.

the hash implementation used 2,881,096 more units of retained heap. that is an increase of 8.9% in this case. this is less than i predicted, probably because the HashMap's load factor makes it use less memory than an optimal hash, which i described above.

i think this is enough to care about, because minimax needs to squeeze as much out of each game state as it can.

edit 1. rather, there was an 8.2% decrease in the amount of heap usage when i switched from hashes to trees. measuring the decrease is more important than measuring the increase, in my opinion.
Last edited by Braids on 13 Jul 2011, 01:07, edited 1 time in total.
"That is the dumbest thing I've ever seen." --Rob Cashwalker, regarding Innistrad double-sided cards. One of the first times he and I have ever agreed on something. ;)
User avatar
Braids
Programmer
 
Posts: 556
Joined: 22 Jun 2011, 00:39
Location: Unknown. Hobby: Driving myself and others to constructive madness.
Has thanked: 1 time
Been thanked: 1 time

Re: Please stop using HashSet, HashMap, and Hashtable

Postby Braids » 12 Jul 2011, 23:51

friarsol wrote:
Braids wrote:your logic is impeccable. which leads me to wonder, "what was i thinking?"
Have no doubt, there has been a ton of performance improvements lately and there is plenty of fat still to trim.

Since trees are easily the way to go for MinMax Game States that probably led you down the line of thinking "Let's have trees everywhere."
that probably influenced my thinking, but i remember more clearly now. i had forgotten about all the null pointers in a balanced tree. i thought the hash size was 2n and the tree size was n. the truth is, the tree size is closer to 2n+1. the hash? who knows. . . anywhere from 1.25*n to 4n, i think.
friarsol wrote:. . . What we should do is make sure we are assigning a size for each Hash to alleviate some potential performance hits (slow downs to resize, and possibly extra wasted space beyond the recommended doubled size).
i think the best you can do is reduce the capacity to 2**ceiling(log2(n+1)).

8.9% an 8.2% decrease . . . maybe i can squeeze that performance out later.
"That is the dumbest thing I've ever seen." --Rob Cashwalker, regarding Innistrad double-sided cards. One of the first times he and I have ever agreed on something. ;)
User avatar
Braids
Programmer
 
Posts: 556
Joined: 22 Jun 2011, 00:39
Location: Unknown. Hobby: Driving myself and others to constructive madness.
Has thanked: 1 time
Been thanked: 1 time

Re: Please stop using HashSet, HashMap, and Hashtable

Postby friarsol » 20 Jul 2011, 22:37

Braids did you say at some point that there is something wrong with Iterators if you don't Iterate through the whole Iteration?
friarsol
Global Moderator
 
Posts: 7593
Joined: 15 May 2010, 04:20
Has thanked: 243 times
Been thanked: 965 times

Re: Please stop using HashSet, HashMap, and Hashtable

Postby Braids » 21 Jul 2011, 03:44

friarsol wrote:Braids did you say at some point that there is something wrong with Iterators if you don't Iterate through the whole Iteration?
only when using an iterator created by YieldUtils.toIterator, which would mean you were working with Generator objects.

besides using the Visitor design pattern, YieldUtils.toIterator is one of the most memory efficient and fastest ways to fetch actual items from a non-infinite Generator. the problem is, it creates a temporary thread to do so. that thread might not die until the Iterator has been fully exercised. then again, the thread may be subject to garbage collection. i'm not sure, so i have been advising caution.

if you're dealing with an infinite Generator (and you probably aren't), please use the Visitor pattern instead of iterators.

when i proliferated Generator usage in Forge, i took care to pass Generators around as much as possible, instead of iterators or iterables. that way, the decision to use YieldUtils.toIterator could be deferred procrastinated to avoid possible thread leakage. you can have faith that the iterables i created or changed will not give you one of these thread mongers.

but like i said, YieldUtils' threads might get garbage collected.
"That is the dumbest thing I've ever seen." --Rob Cashwalker, regarding Innistrad double-sided cards. One of the first times he and I have ever agreed on something. ;)
User avatar
Braids
Programmer
 
Posts: 556
Joined: 22 Jun 2011, 00:39
Location: Unknown. Hobby: Driving myself and others to constructive madness.
Has thanked: 1 time
Been thanked: 1 time

Re: Please stop using HashSet, HashMap, and Hashtable

Postby Rob Cashwalker » 21 Jul 2011, 13:53

This generator thing has got me all confused now.

Let's say I need a cardlist like from CardFactory.getCards() (before being deprecated) then filter that list for cards meeting some criteria.

Do I now write a loop that does iterator.next(), check for the properties, then add it to my own CardList?
The Force will be with you, Always.
User avatar
Rob Cashwalker
Programmer
 
Posts: 2167
Joined: 09 Sep 2008, 15:09
Location: New York
Has thanked: 5 times
Been thanked: 40 times

Re: Please stop using HashSet, HashMap, and Hashtable

Postby Braids » 21 Jul 2011, 17:15

Rob Cashwalker wrote:This generator thing has got me all confused now.
thank you for giving me the opporunity to demonstrate the usefulness of Generators.

Rob Cashwalker wrote:Let's say I need a cardlist like from CardFactory.getCards() (before being deprecated) then filter that list for cards meeting some criteria . . . Do I now write a loop that does iterator.next(), check for the properties, then add it to my own CardList?
you can do that, but i would not suggest it unless the number of items in your CardList is definitely small, say, less than 256 cards. Generators are much more efficient.

here is how you can do card filtering with Generators.

Code: Select all
import net.slightlymagic.braids.util.generator.GeneratorFunctions;
import net.slightlymagic.braids.util.lambda.Lambda1;

import com.google.code.jyield.Generator;
//...
        // You can think of Lambda as a fancy name for "function treated as an
        // object". This Lambda's angle brackets indicate it returns a Boolean
        // and takes a single Card parameter.  The digit one (1) means this
        // Lambda accepts one parameter.
        //
        Lambda1<Boolean,Card> predicate = new Lambda1<Boolean,Card>() {
            public Boolean apply(Card c) {
                if (weWantThisCard) {  // use your own filter logic here.
                    return true;
                }
                else {
                    return false;
                }
            }
        };

        Generator<Card> filteredCards =
            GeneratorFunctions.filterGenerator(predicate, YieldUtils.toGenerator(AllZone.getCardFactory()));
you now have a Generator, named filteredCards, that will produce the cards you want at some point in the future. notice how it doesn't even use CardList. so, it's not taking up space with a temporary list that could be very large.

now, it gets really interesting if you want to apply multiple filters, chaining them together:
Code: Select all
        Lambda1<Boolean,Card> predicate2 = new Lambda1<Boolean,Card>() { //...
        };
        Lambda1<Boolean,Card> predicate3 = new Lambda1<Boolean,Card>() { //...
        };

        filteredCards =
          GeneratorFunctions.filterGenerator(predicate2, filteredCards);
        filteredCards =
          GeneratorFunctions.filterGenerator(predicate3, filteredCards);
even though we have applied multiple filters to filteredCards, we still have not created any temporary lists! instead, the generators have been linked together behind the scenes along with their predicates.

one can also apply transformations to Generators, to change one or all of the cards in the Generator. but i won't cover those here.

once you've done all the filtering (and transformations) you wanted with the Generators, you can then iterate through filteredCards, or create a CardList from it:

Code: Select all
        for (Card card : YieldUtils.toIterable(filteredCards)) {
            //...
            // Please do not use break inside this loop. Otherwise, a memory
            // or thread leak may result.
        }
note the use of the handy but potentially leaky YieldUtils.toIterable. just don't break out of that loop and it should be fine.

if you absolutely need a list, there is a forge.CardList constructor that accepts a Generator of Cards as its parameter:
Code: Select all
import forge.CardList;
//...
        CardList list = new CardList(filteredCards);
iterating is preferred, but if you need to present a sorted list of cards (e.g., in a deck editor), then CardList makes sense. The point is to defer procrastinate the iteration or the creation of the new CardList until the very last moment it is needed.

Edit 1. there are some static methods in forge.CardFilter that perform these very kinds of filters. these methods also acts as good examples for writing your own filters. at present, CardFilter has Generator based filters for rarity, color, and set. if any of these suit your needs, you can use one instead of writing your own predicates. if you think your predicate has potential to be reused, please consider adding a static method to CardFilter.
"That is the dumbest thing I've ever seen." --Rob Cashwalker, regarding Innistrad double-sided cards. One of the first times he and I have ever agreed on something. ;)
User avatar
Braids
Programmer
 
Posts: 556
Joined: 22 Jun 2011, 00:39
Location: Unknown. Hobby: Driving myself and others to constructive madness.
Has thanked: 1 time
Been thanked: 1 time


Return to Developer's Corner

Who is online

Users browsing this forum: No registered users and 37 guests

Main Menu

User Menu

Our Partners


Who is online

In total there are 37 users online :: 0 registered, 0 hidden and 37 guests (based on users active over the past 10 minutes)
Most users ever online was 7967 on 09 Sep 2025, 23:08

Users browsing this forum: No registered users and 37 guests

Login Form