Probleem met F@H

Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1606
Lid geworden op: 23 Nov 2006
Twitter: WimVerlinden
Locatie: Kortenberg
Bedankt: 56 keer
Uitgedeelde bedankjes: 36 keer
Contact:

Probleem met F@H

Berichtdoor seagull » 11 Aug 2009, 10:03

Ik heb een probleem met F@H dat steeds terugkomt.
Ik heb twee xubuntu 64 bit VMs op VMware 1.8 server.
De eerste VM gaf bij het folden steed problemen en dan heb ik de 2de gecopiëerd maar nu komt het probleem terug.
Hieronder de volledige log.
Iemand een tip?

Code: Selecteer alles

[22:18:19] Completed 250000 out of 250000 steps  (100%)

Writing final coordinates.

 Average load imbalance: 127.3 %
 Part of the total run time spent waiting due to load imbalance: 71.1 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Z 0 %

NOTE: 71.1 % performance was lost due to load imbalance
      in the domain decomposition.


NOTE: 15 % of the run time was spent communicating energies,
      you might want to use the -nosum option of mdrun

   Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time: 173757.000 173757.000    100.0
                       2d00h15:57
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     72.276      3.040      0.241     99.521

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[22:18:21] DynamicWrapper: Finished Work Unit: sleep=10000
[22:18:25]
[22:18:25] Finished Work Unit:
[22:18:25] - Reading up to 21148416 from "work/wudata_03.trr": Read 21148416
[22:18:25] trr file hash check passed.
[22:18:25] - Reading up to 4533104 from "work/wudata_03.xtc": Read 4533104
[22:18:25] xtc file hash check passed.
[22:18:25] edr file hash check passed.
[22:18:25] logfile size: 188167
[22:18:25] Leaving Run
[22:18:26] - Writing 26014439 bytes of core data to disk...
[22:18:26]   ... Done.
Error encountered before initializing MPICH
[22:18:31] - Shutting down core
[22:18:31]
[22:18:31] Folding@home Core Shutdown: FINISHED_UNIT
[22:21:57] CoreStatus = 64 (100)
[22:21:57] Unit 3 finished with 31 percent of time to deadline remaining.
[22:21:57] Updated performance fraction: 0.515123
[22:21:57] Sending work to server
[22:21:57] Project: 2672 (Run 0, Clone 110, Gen 165)


[22:21:57] + Attempting to send results [August 10 22:21:57 UTC]
[22:21:57] - Reading file work/wuresults_03.dat from core
[22:21:57]   (Read 26014439 bytes from disk)
[22:21:57] Connecting to http://171.64.65.56:8080/
[22:45:06] Posted data.
[22:45:16] Initial: 0000; - Uploaded at ~18 kB/s
[22:45:16] - Averaged speed for that direction ~24 kB/s
[22:45:16] + Results successfully sent
[22:45:16] Thank you for your contribution to Folding@Home.
[22:45:16] + Number of Units Completed: 95

[22:45:19] - Warning: Could not delete all work unit files (3): Core file absent
[22:45:19] Trying to send all finished work units
[22:45:19] + No unsent completed units remaining.
[22:45:19] - Preparing to get new work unit...
[22:45:19] + Attempting to get work packet
[22:45:19] - Will indicate memory of 1004 MB
[22:45:19] - Connecting to assignment server
[22:45:19] Connecting to http://assign.stanford.edu:8080/
[22:45:19] Posted data.
[22:45:19] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[22:45:19] + News From Folding@Home: Welcome to Folding@Home
[22:45:20] Loaded queue successfully.
[22:45:20] Connecting to http://171.64.65.56:8080/
[22:45:25] Posted data.
[22:45:25] Initial: 0000; - Receiving payload (expected size: 4843462)
[22:45:41] - Downloaded at ~295 kB/s
[22:45:41] - Averaged speed for that direction ~220 kB/s
[22:45:41] + Received work.
[22:45:41] Trying to send all finished work units
[22:45:41] + No unsent completed units remaining.
[22:45:41] + Closed connections
[22:45:41]
[22:45:41] + Processing work unit
[22:45:41] At least 4 processors must be requested.Core required: FahCore_a2.exe
[22:45:41] Core found.
[22:45:41] Working on queue slot 04 [August 10 22:45:41 UTC]
[22:45:41] + Working ...
[22:45:41] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 04 -checkpoint 15 -forceasm -verbose -lifeline 5433 -version 624'

[22:45:41]
[22:45:41] *------------------------------*
[22:45:41] Folding@Home Gromacs SMP Core
[22:45:41] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[22:45:41]
[22:45:41] Preparing to commence simulation
[22:45:41] - Ensuring status. Please wait.
[22:45:50] - Assembly optimizations manually forced on.
[22:45:50] - Not checking prior termination.
[22:45:52] - Expanded 4842950 -> 24001453 (decompressed 495.5 percent)
[22:45:52] Called DecompressByteArray: compressed_data_size=4842950 data_size=24001453, decompressed_data_size=24001453 diff=0
[22:45:53] - Digital signature verified
[22:45:53]
[22:45:53] Project: 2675 (Run 0, Clone 59, Gen 117)
[22:45:53]
[22:45:53] Assembly optimizations on if available.
[22:45:53] Entering M.D.
[22:45:59] Using Gromacs checkpoints
NNODES=4, MYRANK=0, HOSTNAME=folding2
NNODES=4, MYRANK=1, HOSTNAME=folding2
NNODES=4, MYRANK=2, HOSTNAME=folding2
NODEID=0 argc=23
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_04.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 64
NODEID=1 argc=23
NNODES=4, MYRANK=3, HOSTNAME=folding2
NODEID=2 argc=23
NODEID=3 argc=23

Reading checkpoint file work/wudata_04.cpt generated: Thu May 14 19:58:43 2009


-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: checkpoint.c, line: 1151

Fatal error:
Checkpoint file is for a system of 146859 atoms, while the current system consists of 146817 atoms
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[04:01:10] - Autosending finished units... [August 11 04:01:10 UTC]
[04:01:10] Trying to send all finished work units
[04:01:10] + No unsent completed units remaining.
[04:01:10] - Autosend completed
Telenet Business Fibernet 240 plus - Proximus Internet Office&Go Maxi fixed IP (failover)
Proximus Bizz Mobile L (smartphone) & Bizz Mobile XL (Mifi router) - 3StarsNet/FreePBX - Proximus TV

Gebruikersavatar
krisken
Elite Poster
Elite Poster
Berichten: 17178
Lid geworden op: 07 Nov 2006
Twitter: kriskenbe
Locatie: Massemen - 91WET0
Bedankt: 807 keer
Recent bedankt: 6 keer
Uitgedeelde bedankjes: 1648 keer
Contact:

Re: Probleem met F@H

Berichtdoor krisken » 11 Aug 2009, 10:49

Seagull, blijkbaar heb jij kennis van f@h op xubuntu/ubuntu.
Zou jij mij kunnen/willen helpen hiermee?

Heb hier nog een pc'ke dat heel de dag aanstaat en eigenlijk niet veel doet :)

Internet = Orange 100/10Mbps + WirelessBelgië + Billi (2x 100/20Mbps profiel)
Telefonie = WeePee + Speakup + Billi + OVH
GSM = Orange Panter LE + Scarlet Red
TV = Bhaalu + Netflix + Orange
Netwerk = Mikrotik & UBNT powered

Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1606
Lid geworden op: 23 Nov 2006
Twitter: WimVerlinden
Locatie: Kortenberg
Bedankt: 56 keer
Uitgedeelde bedankjes: 36 keer
Contact:

Re: Probleem met F@H

Berichtdoor seagull » 11 Aug 2009, 12:04

krisken schreef:Seagull, blijkbaar heb jij kennis van f@h op xubuntu/ubuntu.
Zou jij mij kunnen/willen helpen hiermee?

Heb hier nog een pc'ke dat heel de dag aanstaat en eigenlijk niet veel doet :)

Geen probleem. Wel mijn usernaam intikken he. :lol:
Telenet Business Fibernet 240 plus - Proximus Internet Office&Go Maxi fixed IP (failover)
Proximus Bizz Mobile L (smartphone) & Bizz Mobile XL (Mifi router) - 3StarsNet/FreePBX - Proximus TV

Gebruikersavatar
biebel
Plus Member
Plus Member
Berichten: 101
Lid geworden op: 07 Jun 2004
Locatie: Leuven
Bedankt: 4 keer

Re: Probleem met F@H

Berichtdoor biebel » 13 Aug 2009, 17:57

@ Krisken: Zoals ik al aangaf in viewtopic.php?f=41&t=23123:

via terminal: (gemakkelijkste is ntl via de gui)
1) Repo aanvullen:

Code: Selecteer alles

sudo nano /etc/apt/sources.list


erbij zetten
ctrl-o om te bewaren
ctrl-x om af te sluiten

2) Repo herladen:

Code: Selecteer alles

sudo apt-get update


3) Origami installeren

Code: Selecteer alles

sudo apt-get install origami


4) FAH installeren met origami:
Instructies vanaf "installation" van https://help.ubuntu.com/community/FoldingAtHome/origami volgen

Via gnome (gui):
1) Repo aanvullen:
System > Administration > Software Sources > Third-party Software > Add


2) Repo herladen:
Bij het sluiten van Software Sources Reload klikken

3) Origami installeren
Zoals bij cli methode is simpelste imo
of System > Administration > Synaptic Package Manager > zoeken naar origami > aanvinken en toepassen

4) Zelfde als cli

Is heus niet zo moeilijk en zeker gemakkelijker dan zelf compilen... zelfs ik heb het werkend gekregen.

Als je intss geüpgrade hebt van hardy (8.04), sla dan stap 1 over.

Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1606
Lid geworden op: 23 Nov 2006
Twitter: WimVerlinden
Locatie: Kortenberg
Bedankt: 56 keer
Uitgedeelde bedankjes: 36 keer
Contact:

Re: Probleem met F@H

Berichtdoor seagull » 24 Aug 2009, 12:44

Folding zit weer in de knoop.
Ik denk maar eens dat ik mijn clients terug zal moeten installeren.
Work directory enz. wissen helpt niet.

Code: Selecteer alles

--- Opening Log file [August 24 11:28:25 UTC]
to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22866 system'
47250000 steps,  94500.0 ps (continuing from step 47000000,  94000.0 ps).

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483269. It should have been within [ 0 .. 9464 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483611. It should have been within [ 0 .. 256 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 3 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_3]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
Telenet Business Fibernet 240 plus - Proximus Internet Office&Go Maxi fixed IP (failover)
Proximus Bizz Mobile L (smartphone) & Bizz Mobile XL (Mifi router) - 3StarsNet/FreePBX - Proximus TV

Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1606
Lid geworden op: 23 Nov 2006
Twitter: WimVerlinden
Locatie: Kortenberg
Bedankt: 56 keer
Uitgedeelde bedankjes: 36 keer
Contact:

Re: Probleem met F@H

Berichtdoor seagull » 24 Aug 2009, 16:40

Ik zal deze werkwijze eens proberen.

Edit 1: Terug up and running :banana: bijna 2.000.000 points :bdaysmile:
Edit 2: De 2000.000 punten zijn binnen :-D
Telenet Business Fibernet 240 plus - Proximus Internet Office&Go Maxi fixed IP (failover)
Proximus Bizz Mobile L (smartphone) & Bizz Mobile XL (Mifi router) - 3StarsNet/FreePBX - Proximus TV

Sven.VdS
Elite Poster
Elite Poster
Berichten: 911
Lid geworden op: 26 Mar 2004
Locatie: Holsbeek
Bedankt: 1 keer

Re: Probleem met F@H

Berichtdoor Sven.VdS » 17 Sep 2009, 14:56

LOL ik sta nog altijd 325e in de algemene ranking ... 'k heb precies den indruk dat F@H dood is :-o
Central heating is for sissies ... if you're cold it's because you don't have enough computers running.
Afbeelding
Afbeelding

Gebruikersavatar
meon
Administrator
Administrator
Berichten: 15585
Lid geworden op: 18 Feb 2003
Twitter: meon
Locatie: Bree
Bedankt: 506 keer
Uitgedeelde bedankjes: 443 keer
Contact:

Re: Probleem met F@H

Berichtdoor meon » 17 Sep 2009, 15:15

Ik denk dat F@H niet dood is, maar vroeger had je:
a) massa-folders zoals ik heb gedaan (100+ computers)
b) de rest

a) zorgde voor punten en kwamen hoog in de rangschikking en b) hengelde achteraan.

Nu, dankzij GPU, PS3, multicore-folding die veel meer punten halen heb je veel meer verschillende mensen die veel punten halen, maar nog altijd no way near de hoeveelheden die wij haalden. Die massa-folders zijn afgehaakt (ik had bvb 100 pc's nodig om 1 GPU-folder bij te kunnen houden qua punten) en ik denk dat die verschuiving maar langzaam zichtbaar wordt...

Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1606
Lid geworden op: 23 Nov 2006
Twitter: WimVerlinden
Locatie: Kortenberg
Bedankt: 56 keer
Uitgedeelde bedankjes: 36 keer
Contact:

Re: Probleem met F@H

Berichtdoor seagull » 26 Nov 2009, 16:58

Weer goe bezig :evil: weer een 22860 project :bang:

[10:26:46] Completed 227270 out of 250000 steps (90%)
[10:36:40] Completed 227500 out of 250000 steps (91%)
[12:33:43] Completed 230000 out of 250000 steps (92%)
[13:47:44] Completed 232500 out of 250000 steps (93%)
[15:34:00] Completed 235000 out of 250000 steps (94%)
[15:34:00] Unit 2's deadline (November 26 14:26) has passed.
[15:34:00] Going to interrupt core and move on to next unit...
[15:34:00] CoreStatus = 0 (0)
[15:34:00] Client-core communications error: ERROR 0x0
[15:34:00] Deleting current work unit & continuing...
Telenet Business Fibernet 240 plus - Proximus Internet Office&Go Maxi fixed IP (failover)
Proximus Bizz Mobile L (smartphone) & Bizz Mobile XL (Mifi router) - 3StarsNet/FreePBX - Proximus TV


Terug naar “Folding@Home”

Wie is er online

Gebruikers op dit forum: Geen geregistreerde gebruikers en 1 gast