4.72 crashes instantly

4.72 crashes instantly

log in

Advanced search

Questions and Answers : Unix/Linux : 4.72 crashes instantly

1 · 2 · Next
Author Message
Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1295 - Posted: 29 Mar 2009, 16:30:46 UTC

Since the version update, all my work units crash instantly with this info:

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
qcn_util::getBOINCInitData requested at 1238312154.734082
Significance Filter Cutoff = 3.000000
Short Term Average Magnitude = 3.000000
Motion sensor initialized of type 7 - JoyWarrior 24F8 USB.
Time synchronization failed local time = 1238312170.615695, will retry in 3 minutes - elapsed time = 15.820867
Time synchronization failed local time = 1238312366.277904, will retry in 3 minutes - elapsed time = 15.580143
Time synchronization failed local time = 1238312561.636182, will retry in 3 minutes - elapsed time = 15.335965
Time synchronization failed local time = 1238312757.398081, will retry in 3 minutes - elapsed time = 15.700846
Time synchronization failed local time = 1238312953.044243, will retry in 3 minutes - elapsed time = 15.546116
QCN exiting, can't make directory ../../data/2009_03_29

zip error: Nothing to do! (../../data/.zip)
Time synchronization failed local time = 1238313148.646996, will retry in 3 minutes - elapsed time = 15.552161
Time synchronization failed local time = 1238313344.544008, will retry in 3 minutes - elapsed time = 15.848812
Time synchronization failed local time = 1238313540.217899, will retry in 3 minutes - elapsed time = 15.621482
QCN exiting, can't make directory ../../data/2009_03_29
SIGSEGV: segmentation violation
Stack trace (12 frames):
qcn_4.72_i686-pc-linux-gnu__nci(boinc_catch_signal+0x65)[0x8069445]
[0xffffe400]
/lib/libc.so.6(fclose+0x1d)[0xb7dacd4d]
qcn_4.72_i686-pc-linux-gnu__nci[0x807d60f]
qcn_4.72_i686-pc-linux-gnu__nci[0x8074249]
qcn_4.72_i686-pc-linux-gnu__nci[0x804daf4]
qcn_4.72_i686-pc-linux-gnu__nci[0x8057f31]
qcn_4.72_i686-pc-linux-gnu__nci[0x805834e]
qcn_4.72_i686-pc-linux-gnu__nci[0x8058931]
qcn_4.72_i686-pc-linux-gnu__nci[0x805917f]
/lib/libc.so.6(__libc_start_main+0xe0)[0xb7d6a630]
qcn_4.72_i686-pc-linux-gnu__nci(__gxx_personality_v0+0xf5)[0x804c371]

Exiting...

</stderr_txt>
]]>

Link to the machine

Carl Christensen
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Jan 08
Posts: 1037
Credit: 11,336
RAC: 34
Message 1299 - Posted: 29 Mar 2009, 17:59:00 UTC - in response to Message 1295.

thanks, I'll have to look into that, the odd thing is the "can't make directory" message which is a QCNLive directory name, not the BOINC trigger directory.

Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1302 - Posted: 29 Mar 2009, 20:11:09 UTC
Last modified: 29 Mar 2009, 20:12:41 UTC

I just checked, the data directory actually have owner/group as root/root, could this be the problem? I'll try changing to boinc/boinc and see what happens (when I'm able to download a new work unit).

Carl Christensen
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Jan 08
Posts: 1037
Credit: 11,336
RAC: 34
Message 1305 - Posted: 30 Mar 2009, 4:32:30 UTC - in response to Message 1302.

that could be it, still it should have exiting more cleanly than the sigsegv

Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1306 - Posted: 30 Mar 2009, 14:55:12 UTC

The above change fixed the problem, though I don't know why the directories had root as owner initially... Boinc is run as user "boinc" on this system.

Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1307 - Posted: 30 Mar 2009, 16:49:00 UTC - in response to Message 1306.

I was a little too fast there - the work unit still crashes with 0 seconds cpu-time, although it does run for some time until it crashes (until the first disk write, I suppose).

MarcinGorecki
Send message
Joined: 3 Mar 09
Posts: 5
Credit: 12,483
RAC: 0
Message 1320 - Posted: 1 Apr 2009, 0:15:14 UTC
Last modified: 1 Apr 2009, 0:15:55 UTC

HI.
Kubuntu 9.04 jaunty x64 + fresh sensor :)

1. gora@gora-core:/$ lsmod
Module Size Used by
joydev 20864 2
nls_iso8859_1 13440 0
etc...

2. ..........
Mar 31 16:54:08 gora-core kernel: [56082.409004] usb 6-1: new low speed USB device using uhci_hcd and address 12
Mar 31 16:54:08 gora-core kernel: [56082.596632] usb 6-1: configuration #1 chosen from 1 choice
Mar 31 16:54:08 gora-core kernel: [56082.630614] input: Code Mercenaries JoyWarrior24 Force 8 as /devices/pci0000:00/0000:00:1d.2/usb6/6-1/6-1:1.0/input/input19
Mar 31 16:54:08 gora-core kernel: [56082.657031] generic-usb 0003:07C0:1113.000D: input,hidraw2: USB HID v1.10 Joystick [Code Mercenaries JoyWarrior24 Force 8] on usb-0000:00:1d.2-1/input0
Mar 31 16:54:08 gora-core kernel: [56082.697614] input: Code Mercenaries JoyWarrior24 Force 8 as /devices/pci0000:00/0000:00:1d.2/usb6/6-1/6-1:1.1/input/input20
Mar 31 16:54:08 gora-core kernel: [56082.709052] generic-usb 0003:07C0:1113.000E: input,hidraw3: USB HID v1.10 Device [Code Mercenaries JoyWarrior24 Force 8] on usb-0000:00:1d.2-1/input1
Mar 31 17:19:39 gora-core -- MARK --
Mar 31 17:39:39 gora-core -- MARK --

3.
jscalibration working ok.

http://qcn.stanford.edu/qcnalpha/show_host_detail.php?hostid=5124

<- Sensor Type: Not Found

Any idea how I can fix this ?

Carl Christensen
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Jan 08
Posts: 1037
Credit: 11,336
RAC: 34
Message 1322 - Posted: 1 Apr 2009, 4:31:57 UTC - in response to Message 1320.

hmm, what's the /dev for this "joystick" driver, it's supposed to be something like

"/dev/js0", \
"/dev/input/js0", \
"/dev/js1", \
"/dev/input/js1", \
"/dev/js2", \
"/dev/input/js2", \

(the above values are the only ones I check!)

MarcinGorecki
Send message
Joined: 3 Mar 09
Posts: 5
Credit: 12,483
RAC: 0
Message 1323 - Posted: 1 Apr 2009, 7:12:19 UTC - in response to Message 1322.

http://img239.imageshack.us/img239/5215/49619249.png

/dev/input/js0

Carl Christensen
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Jan 08
Posts: 1037
Credit: 11,336
RAC: 34
Message 1324 - Posted: 1 Apr 2009, 8:40:17 UTC - in response to Message 1323.

check the qcn stderr.txt in boincdir/slots/0 for any joystick msgs

MarcinGorecki
Send message
Joined: 3 Mar 09
Posts: 5
Credit: 12,483
RAC: 0
Message 1325 - Posted: 1 Apr 2009, 8:55:55 UTC - in response to Message 1324.
Last modified: 1 Apr 2009, 9:43:03 UTC

WU:
http://qcn.stanford.edu/qcnalpha/show_host_detail.php?hostid=5124

hardware details + calibrator:
http://img23.imageshack.us/img23/2629/81227541.png

Stderr.txt:
http://www3.speedyshare.com/data/562780352/16057660/90013239/stderr.txt

slot0 is taken by another project
slot5-> http://img175.imageshack.us/img175/4374/36056804.png

MarcinGorecki
Send message
Joined: 3 Mar 09
Posts: 5
Credit: 12,483
RAC: 0
Message 1326 - Posted: 1 Apr 2009, 15:38:45 UTC - in response to Message 1325.
Last modified: 1 Apr 2009, 15:43:08 UTC

After detach-> new WU:
stderr.txt:


shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
qcn_util::getBOINCInitData requested at 1238599189.442164
Significance Filter Cutoff = 3.000000
Short Term Average Magnitude = 3.000000
Motion sensor initialized of type 7 - JoyWarrior 24F8 USB.
Time synchronization failed local time = 1238599192.678877, will retry in 3 minutes - elapsed time = 3.154076
Time synchronization failed local time = 1238599375.880292, will retry in 3 minutes - elapsed time = 3.155493
Time synchronization failed local time = 1238599559.197723, will retry in 3 minutes - elapsed time = 3.212734
Time synchronization failed local time = 1238599742.380168, will retry in 3 minutes - elapsed time = 3.155371


gora@gora-core:~/BOINC/data$ gora@gora-core:~/BOINC/data$ ls

_000001_1238599347.zip
_000002_1238599455.zip

2009_04_01/
_000002_1238599800.zip
_000002_1238600400.zip




gora@gora-core:/$ lsmod
Module Size Used by
joydev 20864 1

BOINC msg:
1 kwi 2009, 16:31:04|QCN Alpha Test|Resuming task qb_sc300_sta300_003983_0 using qcnalpha version 472

WU is running (using 0.01% CPU) 0.000%

OS- Jaunty 9.04 64bit, client BOINC 6.4.5 x64

Carl Christensen
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Jan 08
Posts: 1037
Credit: 11,336
RAC: 34
Message 1327 - Posted: 1 Apr 2009, 16:13:02 UTC - in response to Message 1326.
Last modified: 1 Apr 2009, 19:13:36 UTC

it looks like it's detected, but there's other problems I'll have to fix such as why the filenames are off, and why is it making the qcnlive subdirecdtory. I don't have access to a Linux box right now so it may take a week or two (I have a new Windows laptop coming in I hope to dual boot Linux).

edit: the problem seems to be that BOINC thinks the linux app is running in "standalone" mode, hence it tries making these odd directories that qcnlive uses etc. hopefully it will be easy to fix (probably don't rely on "boinc_is_standalone()" function call! :-)

Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1329 - Posted: 1 Apr 2009, 19:13:55 UTC

An update on my problem:
The last unit I've downloaded has now run for 1 day and 12 hours and is 150% complete (according to the graphics)! CPU time 4h43m, progress according to Boinc Manager 0%. No trickles have been uploaded, but the xyz and composite graphs are all showing activity.

Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1357 - Posted: 5 Apr 2009, 12:40:09 UTC

New update: I aborted the last work unit as it had reached 450%. I then reset the project and waited for a new unit. That one is running now, but every time it resets itself it spawns a new project, the old one being "defunct". This has happened 41 times so far.

Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1387 - Posted: 11 Apr 2009, 9:27:47 UTC

I've just downloaded the 4.74 app, and already it seems to work much better: progress percent is updating, which it wasn't with the 4.72.
I'll keep you updated when I get home from work this evening...

Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1389 - Posted: 11 Apr 2009, 17:44:42 UTC

... and it's trickling fine, so thanks for solving the problem, now I'm back on the trigger map! ;-)

talister
Send message
Joined: 17 May 08
Posts: 6
Credit: 50,550
RAC: 591
Message 1390 - Posted: 11 Apr 2009, 18:17:43 UTC

4.74 fixes my problem with 4.72 also where it would just sit there and the percentage wouldn't increase. Just jiggled my sensor and created an "earthquake" trickle ;-) Will monitor to see if the plague of ntpdate processes come back.

Carl Christensen
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Jan 08
Posts: 1037
Credit: 11,336
RAC: 34
Message 1394 - Posted: 11 Apr 2009, 21:08:04 UTC - in response to Message 1390.
Last modified: 11 Apr 2009, 21:10:09 UTC

glad to hear it, I really didn't get a chance to do any Linux-specific fixes but just recompiled it for the new graphics (2D) I'm working on for schools/qcnlive. also I changed a little bit of the code for the USB sensors which should help trigger detection.

my only Linux "test box" is a Ubuntu VMWare image I run on my Mac or Windows laptop -- it seemed to work OK except I get no globe view (white outline instead of the map!) and I see triggers on the graphics (vertical dashed purple line) when I shake the sensor, but BOINC never seems to report it.

Is this typical (i.e. do you guys see a "white globe" & no trickles) or maybe it's just something odd with running under VMWare...

PS -- OK just looked at you guys' computer pages, you seem to be time-syncing & triggering OK, so that's nice to know, it must be just my VMWare.

hopefully when I settle down (I've been on the road for two years it seems) I can get a proper Linux box to use....

Kenneth Larsen
Send message
Joined: 16 Dec 08
Posts: 36
Credit: 49,867
RAC: 0
Message 1396 - Posted: 11 Apr 2009, 21:44:15 UTC
Last modified: 11 Apr 2009, 21:46:39 UTC

Couldn't you repartition the drive and dual boot with linux? I admit, it's a few years ago since I've done this (I've been windows-free for several years now!), but as far as I remember you should first install windows, then linux, and then repair the windows boot sector using the windows install cd.

By the way, the globe looks just fine on my machine, but it might be a driver issue... I can post my versions of the relevant graphics libraries if you are interested, just tell me which are used.

1 · 2 · Next
Post to thread

Questions and Answers : Unix/Linux : 4.72 crashes instantly


Return to Quake-Catcher Network Sensor Monitoring main page


Copyright © 2013 Stanford University