Probleem met F@H

Hier kan je alles posten dat te maken heeft met Folding@Home.
Gesloten
Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1973
Lid geworden op: 23 nov 2006, 08:55
Twitter: WimVerlinden
Locatie: Kortenberg
Uitgedeelde bedankjes: 52 keer
Bedankt: 101 keer

Ik heb een probleem met F@H dat steeds terugkomt.
Ik heb twee xubuntu 64 bit VMs op VMware 1.8 server.
De eerste VM gaf bij het folden steed problemen en dan heb ik de 2de gecopiëerd maar nu komt het probleem terug.
Hieronder de volledige log.
Iemand een tip?

Code: Selecteer alles

[22:18:19] Completed 250000 out of 250000 steps  (100%)

Writing final coordinates.

 Average load imbalance: 127.3 %
 Part of the total run time spent waiting due to load imbalance: 71.1 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Z 0 %

NOTE: 71.1 % performance was lost due to load imbalance
      in the domain decomposition.


NOTE: 15 % of the run time was spent communicating energies,
      you might want to use the -nosum option of mdrun

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time: 173757.000 173757.000    100.0
                       2d00h15:57
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     72.276      3.040      0.241     99.521

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[22:18:21] DynamicWrapper: Finished Work Unit: sleep=10000
[22:18:25] 
[22:18:25] Finished Work Unit:
[22:18:25] - Reading up to 21148416 from "work/wudata_03.trr": Read 21148416
[22:18:25] trr file hash check passed.
[22:18:25] - Reading up to 4533104 from "work/wudata_03.xtc": Read 4533104
[22:18:25] xtc file hash check passed.
[22:18:25] edr file hash check passed.
[22:18:25] logfile size: 188167
[22:18:25] Leaving Run
[22:18:26] - Writing 26014439 bytes of core data to disk...
[22:18:26]   ... Done.
Error encountered before initializing MPICH
[22:18:31] - Shutting down core
[22:18:31] 
[22:18:31] Folding@home Core Shutdown: FINISHED_UNIT
[22:21:57] CoreStatus = 64 (100)
[22:21:57] Unit 3 finished with 31 percent of time to deadline remaining.
[22:21:57] Updated performance fraction: 0.515123
[22:21:57] Sending work to server
[22:21:57] Project: 2672 (Run 0, Clone 110, Gen 165)


[22:21:57] + Attempting to send results [August 10 22:21:57 UTC]
[22:21:57] - Reading file work/wuresults_03.dat from core
[22:21:57]   (Read 26014439 bytes from disk)
[22:21:57] Connecting to http://171.64.65.56:8080/
[22:45:06] Posted data.
[22:45:16] Initial: 0000; - Uploaded at ~18 kB/s
[22:45:16] - Averaged speed for that direction ~24 kB/s
[22:45:16] + Results successfully sent
[22:45:16] Thank you for your contribution to Folding@Home.
[22:45:16] + Number of Units Completed: 95

[22:45:19] - Warning: Could not delete all work unit files (3): Core file absent
[22:45:19] Trying to send all finished work units
[22:45:19] + No unsent completed units remaining.
[22:45:19] - Preparing to get new work unit...
[22:45:19] + Attempting to get work packet
[22:45:19] - Will indicate memory of 1004 MB
[22:45:19] - Connecting to assignment server
[22:45:19] Connecting to http://assign.stanford.edu:8080/
[22:45:19] Posted data.
[22:45:19] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[22:45:19] + News From Folding@Home: Welcome to Folding@Home
[22:45:20] Loaded queue successfully.
[22:45:20] Connecting to http://171.64.65.56:8080/
[22:45:25] Posted data.
[22:45:25] Initial: 0000; - Receiving payload (expected size: 4843462)
[22:45:41] - Downloaded at ~295 kB/s
[22:45:41] - Averaged speed for that direction ~220 kB/s
[22:45:41] + Received work.
[22:45:41] Trying to send all finished work units
[22:45:41] + No unsent completed units remaining.
[22:45:41] + Closed connections
[22:45:41] 
[22:45:41] + Processing work unit
[22:45:41] At least 4 processors must be requested.Core required: FahCore_a2.exe
[22:45:41] Core found.
[22:45:41] Working on queue slot 04 [August 10 22:45:41 UTC]
[22:45:41] + Working ...
[22:45:41] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 04 -checkpoint 15 -forceasm -verbose -lifeline 5433 -version 624'

[22:45:41] 
[22:45:41] *------------------------------*
[22:45:41] Folding@Home Gromacs SMP Core
[22:45:41] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[22:45:41] 
[22:45:41] Preparing to commence simulation
[22:45:41] - Ensuring status. Please wait.
[22:45:50] - Assembly optimizations manually forced on.
[22:45:50] - Not checking prior termination.
[22:45:52] - Expanded 4842950 -> 24001453 (decompressed 495.5 percent)
[22:45:52] Called DecompressByteArray: compressed_data_size=4842950 data_size=24001453, decompressed_data_size=24001453 diff=0
[22:45:53] - Digital signature verified
[22:45:53] 
[22:45:53] Project: 2675 (Run 0, Clone 59, Gen 117)
[22:45:53] 
[22:45:53] Assembly optimizations on if available.
[22:45:53] Entering M.D.
[22:45:59] Using Gromacs checkpoints
NNODES=4, MYRANK=0, HOSTNAME=folding2
NNODES=4, MYRANK=1, HOSTNAME=folding2
NNODES=4, MYRANK=2, HOSTNAME=folding2
NODEID=0 argc=23
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_04.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 64
NODEID=1 argc=23
NNODES=4, MYRANK=3, HOSTNAME=folding2
NODEID=2 argc=23
NODEID=3 argc=23

Reading checkpoint file work/wudata_04.cpt generated: Thu May 14 19:58:43 2009


-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: checkpoint.c, line: 1151

Fatal error:
Checkpoint file is for a system of 146859 atoms, while the current system consists of 146817 atoms
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[04:01:10] - Autosending finished units... [August 11 04:01:10 UTC]
[04:01:10] Trying to send all finished work units
[04:01:10] + No unsent completed units remaining.
[04:01:10] - Autosend completed
Business Fibernet 300 plus met Speedboost 1G Business- Proximus Internet Maxi fixed IP (failover)
Business Mobile Flex + (smartphone) & Business Mobile Flex (Mifi router) - 3CX/Teams Direct Routing - Proximus TV
Gebruikersavatar
krisken
userbase crew
userbase crew
Berichten: 19763
Lid geworden op: 07 nov 2006, 12:11
Twitter: kriskenbe
Locatie: Massemen - 91WET0
Uitgedeelde bedankjes: 1857 keer
Bedankt: 1035 keer

Seagull, blijkbaar heb jij kennis van f@h op xubuntu/ubuntu.
Zou jij mij kunnen/willen helpen hiermee?

Heb hier nog een pc'ke dat heel de dag aanstaat en eigenlijk niet veel doet :)

Internet = Orange 150/15Mbps + WirelessBelgië
Telefonie = EDPnet + OVH
GSM = Orange Go Extreme SE + Scarlet Red
TV = TVV App + Netflix + Disney+ + Streamz
Netwerk = Mikrotik + Ubiquiti
Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1973
Lid geworden op: 23 nov 2006, 08:55
Twitter: WimVerlinden
Locatie: Kortenberg
Uitgedeelde bedankjes: 52 keer
Bedankt: 101 keer

krisken schreef:Seagull, blijkbaar heb jij kennis van f@h op xubuntu/ubuntu.
Zou jij mij kunnen/willen helpen hiermee?

Heb hier nog een pc'ke dat heel de dag aanstaat en eigenlijk niet veel doet :)
Geen probleem. Wel mijn usernaam intikken he. :lol:
Business Fibernet 300 plus met Speedboost 1G Business- Proximus Internet Maxi fixed IP (failover)
Business Mobile Flex + (smartphone) & Business Mobile Flex (Mifi router) - 3CX/Teams Direct Routing - Proximus TV
Gebruikersavatar
biebel
Plus Member
Plus Member
Berichten: 101
Lid geworden op: 08 jun 2004, 00:25
Locatie: Leuven
Bedankt: 4 keer

@ Krisken: Zoals ik al aangaf in http://userbase.be/forum/viewtopic.php?f=41&t=23123:

via terminal: (gemakkelijkste is ntl via de gui)
1) Repo aanvullen:

Code: Selecteer alles

sudo nano /etc/apt/sources.list
erbij zetten
ctrl-o om te bewaren
ctrl-x om af te sluiten

2) Repo herladen:

Code: Selecteer alles

sudo apt-get update
3) Origami installeren

Code: Selecteer alles

sudo apt-get install origami
4) FAH installeren met origami:
Instructies vanaf "installation" van https://help.ubuntu.com/community/FoldingAtHome/origami volgen

Via gnome (gui):
1) Repo aanvullen:
System > Administration > Software Sources > Third-party Software > Add
2) Repo herladen:
Bij het sluiten van Software Sources Reload klikken

3) Origami installeren
Zoals bij cli methode is simpelste imo
of System > Administration > Synaptic Package Manager > zoeken naar origami > aanvinken en toepassen

4) Zelfde als cli

Is heus niet zo moeilijk en zeker gemakkelijker dan zelf compilen... zelfs ik heb het werkend gekregen.

Als je intss geüpgrade hebt van hardy (8.04), sla dan stap 1 over.
Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1973
Lid geworden op: 23 nov 2006, 08:55
Twitter: WimVerlinden
Locatie: Kortenberg
Uitgedeelde bedankjes: 52 keer
Bedankt: 101 keer

Folding zit weer in de knoop.
Ik denk maar eens dat ik mijn clients terug zal moeten installeren.
Work directory enz. wissen helpt niet.

Code: Selecteer alles

--- Opening Log file [August 24 11:28:25 UTC] 
to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22866 system'
47250000 steps,  94500.0 ps (continuing from step 47000000,  94000.0 ps).

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483269. It should have been within [ 0 .. 9464 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483611. It should have been within [ 0 .. 256 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 3 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_3]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
Business Fibernet 300 plus met Speedboost 1G Business- Proximus Internet Maxi fixed IP (failover)
Business Mobile Flex + (smartphone) & Business Mobile Flex (Mifi router) - 3CX/Teams Direct Routing - Proximus TV
Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1973
Lid geworden op: 23 nov 2006, 08:55
Twitter: WimVerlinden
Locatie: Kortenberg
Uitgedeelde bedankjes: 52 keer
Bedankt: 101 keer

Ik zal deze werkwijze eens proberen.

Edit 1: Terug up and running :banana: bijna 2.000.000 points :bdaysmile:
Edit 2: De 2000.000 punten zijn binnen :-D
Business Fibernet 300 plus met Speedboost 1G Business- Proximus Internet Maxi fixed IP (failover)
Business Mobile Flex + (smartphone) & Business Mobile Flex (Mifi router) - 3CX/Teams Direct Routing - Proximus TV
Sven.VdS
Elite Poster
Elite Poster
Berichten: 911
Lid geworden op: 26 maa 2004, 20:01
Locatie: Holsbeek
Bedankt: 1 keer

LOL ik sta nog altijd 325e in de algemene ranking ... 'k heb precies den indruk dat F@H dood is :-o
Central heating is for sissies ... if you're cold it's because you don't have enough computers running.
Afbeelding
Afbeelding
Gebruikersavatar
meon
Administrator
Administrator
Berichten: 16726
Lid geworden op: 18 feb 2003, 22:02
Twitter: meon
Locatie: Bree
Uitgedeelde bedankjes: 573 keer
Bedankt: 770 keer

Ik denk dat F@H niet dood is, maar vroeger had je:
a) massa-folders zoals ik heb gedaan (100+ computers)
b) de rest

a) zorgde voor punten en kwamen hoog in de rangschikking en b) hengelde achteraan.

Nu, dankzij GPU, PS3, multicore-folding die veel meer punten halen heb je veel meer verschillende mensen die veel punten halen, maar nog altijd no way near de hoeveelheden die wij haalden. Die massa-folders zijn afgehaakt (ik had bvb 100 pc's nodig om 1 GPU-folder bij te kunnen houden qua punten) en ik denk dat die verschuiving maar langzaam zichtbaar wordt...
Gebruikersavatar
seagull
Elite Poster
Elite Poster
Berichten: 1973
Lid geworden op: 23 nov 2006, 08:55
Twitter: WimVerlinden
Locatie: Kortenberg
Uitgedeelde bedankjes: 52 keer
Bedankt: 101 keer

Weer goe bezig :evil: weer een 22860 project :bang:
[10:26:46] Completed 227270 out of 250000 steps (90%)
[10:36:40] Completed 227500 out of 250000 steps (91%)
[12:33:43] Completed 230000 out of 250000 steps (92%)
[13:47:44] Completed 232500 out of 250000 steps (93%)
[15:34:00] Completed 235000 out of 250000 steps (94%)
[15:34:00] Unit 2's deadline (November 26 14:26) has passed.
[15:34:00] Going to interrupt core and move on to next unit...
[15:34:00] CoreStatus = 0 (0)
[15:34:00] Client-core communications error: ERROR 0x0
[15:34:00] Deleting current work unit & continuing...
Business Fibernet 300 plus met Speedboost 1G Business- Proximus Internet Maxi fixed IP (failover)
Business Mobile Flex + (smartphone) & Business Mobile Flex (Mifi router) - 3CX/Teams Direct Routing - Proximus TV
Gesloten

Terug naar “Folding@Home”