Thursday, August 20, 2009

GAMESS, memory and parallel



The screencast above is a repeat from the last post, where I computed the frequencies of a molecule at the RHF/3-21G level of theory (second screencast) and discovered that the amount of memory that GAMESS requests (1,000,000 words) was not enough. While GAMESS tells me how much it needs in the output file, it does so after a lot of computation, which is wasted because I have to start over.

In this post I discuss the basics of GAMESS memory requirements. The simplest case is when you are not running in parallel, i.e. the number of processors in GAMESSQ is set to 1, so I discuss this case first.

1. Memory is specified in $system, and the default is 1 mega-word (i.e. 1,000,000 words):

$system mwords=1 $end

mword can only be integers (1, 2, 3, ..). (In the screencast above I used the older "memory" keyword but mwords is much more convenient.)

2. A word is 8 bytes, so the default memory request (1 mega-word) is a very modest 8 MB of RAM.

3. This is the maximum amount of memory GAMESS is allowed to use. Depending on the type of calculation GAMESS might use less. The option is there to avoid filling up the memory on the computer entirely, which will crash or freeze the computer.

4. Here is the simplest way to deal with memory: My current laptop computer has 2 GB of RAM, and I use it for other things while GAMESS is running, so I am willing to give GAMESS a maximum of roughly 1 GB of RAM. This translates to

1 GB ≈ 1,000 MB ≈ 125 mwords

5. Adding

$system mwords=125 $end
to all input files will allow me to run most GAMESS jobs that I would want to run on a laptop without ever getting a "not enough memory" error (of course if you have less memory you need to adjust mwords accordingly).

That's basically it. What follows is some more details that most casual users of GAMESS won't have to worry about.

6. Most common GAMESS runs will never get close to using 1 GB of RAM, meaning the memory is free to be used by other programs. The most common types of runs that potentially could use that much memory are frequency calculations using RHF and any kind of MP2 calculation. Slightly less common ones are TDDFT and NMR calculations. In GAMESS, DFT frequency calculations are done numerically, and do not require a lot of memory.

7. If you are in doubt whether you have reqeusted enough memory to perform a calculation it is possible to use GAMESS to check using

$contrl exetyp=check $end


This keyword will simulate an actual GAMESS run by skipping the most time consuming steps so it is very fast.

I show two examples (an RHF/3-21G frequency and MP2/6-31G(d) single calculations on 3TSa) in this screencast.



Parallel runs

8. Many desktop computers (and almost all larger computers) have more than one CPU (also known as cores). For example, my current laptop has one processor with 2 cores. Thus, I could make my GAMESS calculations go faster by specifying 2 processors when submitting with GAMESSQ (processor = core in GAMESS-speak). This means that two separate but related GAMESS calculations are running simultaneously, and this affects the memory request:

9. Most types of runs use replicated memory. This means that if mwords=1 but I ask for 2 processors, GAMESS will use a maximum of 2 mega-words. Thus, if I routinely use 2 processors when running on my laptop and want to impose a limit of 1 GB RAM, I should use

$system mwords=63 $end

instead of 125.

10. The most common exception to this is MP2 calculations, where GAMESS uses both replicated (mwords) and distributed memory (memddi). Distributed memory is memory shared by the cores. If I specify memddi=100 and ask for 2 processors then 100 mega-words of RAM is distributed among the 2 processors (the simplest case is that each core gets 50 mega-words, but GAMESS figures that out for you).

So running on 2 processors with

$system mwords=15 memddi=100 $end
requests a total of (15 + 15 + 100 =) 130 mwords. You can use exetyp=check to figure out the optimum values of memory and memddi, as I show in this screencast.



The screencast shows how you can use 1 processor and

$contrl exetyp=check $end
$system memddi=100 parall=.t. $end


to check the memddi requirements. You need to make memddi large for this to work, but GAMESS will not use this memory during the check-run. Note that parall only "does something" for check runs: true parallel runs are specified by choosing more than 1 processors when you submit GAMESS.

You can get a complete list of runs that require memddi in Chapter 2 of the GAMESS documentation under the entry for $system. Also, Chapter 5 has a more in-depth description of how memory is handled.

21 comments:

glenrs said...

Exactly the information, and more, that I've been after for some time now. Thanks!

Jan Jensen said...

Glen - you're very welcome. Glad to hear the post was of use.

-oh said...

hi!
thanks for explaining the memory usage in GAMESS. This is very helpful!
About running in parallel. How should one treat a dual processor system where there are TWO actual processors with SIX cores each. (a Mac Pro)
So far I've only been able to get 50% cpu utilization (i.e. one processor running all 6 cores, but the other one stagnating).
Thanks!

Jan Jensen said...

You should be able to use all 12 cores, so I am not sure what the problem is. Two common factors that can limit cpu utilization are (1) running something that writes to disk a lot (e.g. if you don't use dirscf=.t.) and (2) asking for more memory than you actually have.

I suggest posting your question to the google gamess list, so see if other people have had similar problems.

I am happy to hear you found the post helpful

-oh said...

The solution to the 12 core problem was to type in 24! taking into account virtual cores. Magic of Mac :)
Thanks for your suggestions about the dirscf, that really sped things up!

Jan Jensen said...

Good to know! Thanks for reporting back on that.

regie said...

Hi, I installed gamess in beowulf cluster. I am using rocks 5.3 running in CENTOS 5.4 for my cluster. I have 1 frontend node with 4 cores and 5 work nodes with 2 cores each. I want to know is how can I use the work nodes?Right now I can only use my frontend node which has 4 cores. I want to utilize all work nodes. I issue the command ./rungms exam01 00 10 >& exam01.log which I know means running my program in 10 cores but as per my monitoring it only uses 4 cores from my fronend. Thanks in advance.

Jan Jensen said...

There are people on the google GAMESS group who know far more about that than me. I suggest posting your question there.

Anonymous said...

I know I'm a little late posting to this but WOW! This is really great information here. Thank you for posting it!

One quick question for you. In point 6 you mention most GAMESS runs will never get close to using 1GB of system memory. My understanding is that specifying mwords=130 doesn't guarantee that all of the allocated memory will be used by the simulation. (Hence it is possible to over-allocate memory)

Most RHF and MP2 calculations I've seen use very little memory. For my own curiosity, I was wondering what an example input file might look like that would need to use a large quantity(1GB-20GB+) of memory. Thanks.

Jan Jensen said...

Glad to hear you found the post useful!

"My understanding is that specifying mwords=130 doesn't guarantee that all of the allocated memory will be used by the simulation."

Correct

MP2 calculations on a large molecule (>50 heavy atoms) with a large basis set (e.g. aug-cc-pVQZ) will use a lot of memory.

Also, pretty much any CCSD(T) calculation will use more than 1 GB of RAM

Hernán Sánchez said...

Thanks a lot Jan! This post has saved me so much time, maybe weeks!!. I'm starting introducing myself into the world of calculations, and I was stucked with a calculation because memory issues.

Excuse my english pls.

Regards, Hernán

Jan Jensen said...

Glad to hear you found the post useful. Thanks for letting me know!

Anonymous said...

thank you so much. helped me a lot

Unknown said...

Hi Jan. First of all, very usefull blog!
I am having the 'not enough memory' error in my calculations, and changed the mwords to 125 as you suggested. Of course 2 mwords would be enough as gammes told me, but just in case. The thing is, even with this modification, gamess still recognize only 1 mword. I have tried even the memory keyword, and the same result came up. Can you give me a little help with this?
Thank you very much Jan, keep up with the good work! Greetings from Brazil!

Jan Jensen said...

I would need to see the log file (dropbox, google docs, github, patebin, ...)

Glad to hear you find the blog useful.

Unknown said...

Hi Jan. Can I email that to you? Or even create a dropbox shared folder, but I need to know you email account.

Thanks a lot!

Jan Jensen said...

You can simply paste the link to the file of folder here in the comments section. I'd like for everyone to have access to the file.

Unknown said...

Here is the link.

https://www.dropbox.com/s/knedchtjxizipgf/estrutura.log

Sorry for not sending it sooner. Thanks a lot for the help!

Jan Jensen said...

I'm not sure what the problem is, but it is not just with the MEMORY keyword. MAXIT and UNITS are also ignored. The remaining keywords are defaults, so it's unknown whether they are read correctly.

The $basis group is definitely read correctly. The input looks OK but GAMESS is not interpreting the text correctly. My best guess is a keyboard issue. Try copying commands from input files that work, or generate the files using MacMolPlt of Avogadro.

You may also want to post the question to the Google GAMESS group to see if anyone else has better ideas

Unknown said...

Ok Jan. I will try to rewrite the input. If any solution comes up, I will post it here later. Thank you for helping!

Unknown said...

Hi Jan. The input constructed using macmolplt worked very well. Problem solved! Thank you very much.