Hi,
I tried to run Simutrans Experimental 7.3 on Linux (Ubuntu) but I keep getting a segfault.
gdb output:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()
I tried the i386 and amd6 versions, tried deleting the settings.xml, nothing works.
I use the build from http://www.43-1.org/~simutrans/simutrans-exp/i386/
Anyone managed to run STE on Linux ?
Gyom,
thank you for your report. Judging by your memory address, you are on a 64-bit system, yes? In that case, you should use the 64-bit version. I have not seen any crashes in that method before. Indeed, that is one that is not unique to Simutrans-Experimental (that part of the program is exactly the same as Standard). Have you tried running Standard on Linux?
on my ubuntu 64-bit system the 32-bit version runs, unlike the 64-bit version.
I run it on Linux (Debian) all the time -- but I have a 32-bit system.
Looks like we may have a problem with 64-bit cleanliness in the code. This is going to be tedious to find because there's lots of slightly sloppy integer type usage. I'm trying to clean that up, but it will take a long time.
If you could give a full backtrace ('bt' in gdb) it might help.
Starting program: ~/simuexp/simutrans-exp-64-2010-04-11-fd32563
[Thread debugging using libthread_db enabled]
Reading low level config data ...
p****_simuconf() at config/simuconf.tab: Reading simuconf.tab successful!
Preparing display ...
Screen Flags: requested=10, actual=10
Loading font 'font/prop.fnt'
font/prop.fnt sucessfully loaded as old format prop font!
Init done.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()
(gdb) bt
#0 0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()
#1 0x000000000048e011 in button_t::zeichnen(koord) ()
#2 0x00000000004d15c7 in gui_container_t::zeichnen(koord) ()
#3 0x00000000004a18b3 in gui_scrollpane_t::zeichnen(koord) ()
#4 0x00000000004d15c7 in gui_container_t::zeichnen(koord) ()
#5 0x00000000004f9be3 in pakselector_t::zeichnen(koord, koord) ()
#6 0x000000000059ad39 in simu_main(int, char**) ()
#7 0x0000000000616664 in main ()
thanks for the howto in:
http://forum.simutrans.com/index.php?topic=4871.msg48080#msg48080
Thanks Sdog ! After reading your post I downloaded again the i386 version and indeed it is running fine !
I must have mixed it up while copying with the amd64 :-X
My backtrace for the amd64 is the exact same as Sdog.
Thank you all for your answers !
At last I can try the new version ! :)
#0 0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()
#1 0x000000000048e011 in button_t::zeichnen(koord) ()
#2 0x00000000004d15c7 in gui_container_t::zeichnen(koord) ()
#3 0x00000000004a18b3 in gui_scrollpane_t::zeichnen(koord) ()
#4 0x00000000004d15c7 in gui_container_t::zeichnen(koord) ()
#5 0x00000000004f9be3 in pakselector_t::zeichnen(koord, koord) ()
#6 0x000000000059ad39 in simu_main(int, char**) ()
#7 0x0000000000616664 in main ()
Hmm, looks like these are compiled without debugging information. :-( A build with debugging information would give us *line numbers*.
It's not actually much slower, it just makes the files a bit larger. To whoever does the automatic builds, could you perhaps build with DEBUG=1 to get us debug information in the official builds of experimental? It seems to need a lot of debugging, so....
I'm guessing there's heavy inlining going on because there's nothing suspicious in the named routine, but there *is* suspicious stuff in the routines it calls: display_fb_internal and display_vl_internal. The first thing to try is a recompile with -DUSE_C activated, because I bet that x86 ****embly language code has embedded ****umptions about the size of int. If that fails, then the USE_C version probably has embedded ****umptions too.
Standard is probably broken on amd64 too.
Done. Debugging information should be included starting with the next build.
Ahh - I had taken debugging information out on the "official" builds because I wanted a clean release build for people to use that is as optimised as possible, without (for example) lots of ****ert checks slowing things down. Indeed, I specifically use the #IFDEF DEBUG flag to give additional (and not so user friendly) information in the GUI on occasions. Certainly, the Windows version is compiled as a "release" build in MSVC++.
What, I think, we could really do with is having nightly builds for Experimental on all platforms: the nightly builds would have the debugging information turned on, and the release builds would have it turned off.
As to the original poster's query - if I recall correctly, I think that there are 64-bit Linux builds of Standard. Perhaps some testing could be done of the Standard version to check whether this is a problem there, too? If it is, perhaps this topic needs to be moved out of the "Experimental" section of the board.
Agreed. This would be very helpful. Could one of the people with AMD64, where experimental is failing, try the "Standard" 64-bit build, so we know whether the problem is here or there?
as far as i remember it was the same issue in standard.
i'm just waiting for the nightly page to come online again, then i'll try.
sim-linux64_2010-04-27_v102.3_r3185
runs
SDog,
thank you very much for the test there. I don't currently have a Linux system on which to test Simutrans-Experimental, so this might be very hard for me to track down. Is anyone with a 64-bit Linux platform and the ability to compile from source able to ****ist here? If somebody could compile a build with all the debugging information turned on, it would be useful to have a full backtrace.
The difficult thing about this bug is that it appears to occur in code untouched by Experimental (and thus code with which I am entirely unfamiliar). Can anyone with 64-bit Linux confirm whether the 32-bit version runs satisfactorily? Simutrans does not, to my knowledge, benefit in any way from being compiled in 64-bit: indeed, there are no 64-bit Windows builds for this reason, but it has been suggested that a 64-bit build can be more stable on 64-bit Linux than a 32 bit build.
Finally, can anyone confirm whether this bug still occurs with the latest devel branch? If it works in the most recent versions of Simutrans-Standard, it is possible that some recent change to the code in Standard helped to fix the problem that was present earlier - I do recall that there have been some changes to the code recently that deal with platform/architecture issues.
i suppose it's not very urgent, but would still good to track down the problem. if i can help depends mostly on the makefile. can you point me to the github url? (just found it, overlooked it two times.) Give me also some time to get used to git again.
Sdog,
thank you very much for volunteering to help - much appreciated :-)
Hi,
"me too" lol. I'm running Gentoo amd64 and the 64bit version of STE segfaults whilst the 32bit version runs fine (ok so far I'm only at the start screen). Do you still need backtraces for this?
I have a little request too - any chance the names of the downloads could be made more informative? Ideal would be a name like "simutrans-experimental-amd64-2010-05-02" (same for the experimental Pak btw, the download filename is without version number). That way it would be much easier to track which version people used when encountering problems.
Here's the MD5 for the version that segfaults:
fd38cb53b873621aeba0c3dd2866ee22 simutrans-exp-latest
And thanks very much for this amazing game :)
Steffen,
glad that you enjoy Simutrans-Experimental! Ansgar is in charge of the Linux builds, so a request to change the names should be directed to him. As to the crashes - a backtrace would indeed be useful, as it would be good to get a 64-bit version running in case people have problems running the 32-bit version (although note that there is no performance advantage to the 64-bit version as I understand it: if Simutrans is using anywhere near 4Gb of memory, something is wrong in any case).
I should also be interested in your experiences of the stability of the 32-bit version on 64-bit Linux, to see whether a 64-bit version really is needed at all.
Actually, if you poke around a little, you'll find that the files are already available with dates. simutrans-experimental-latest is a link to the dated one.
Perhaps we should simply point people to
http://www.43-1.org/~simutrans/simutrans-exp/i386/
http://www.43-1.org/~simutrans/simutrans-exp/amd64/
And ask them to get the files with the latest dates?
Starting with the next build (simutrans|makeobj)-exp-latest should redirect to the files including a date. This way the browser should save the files that way as well.
Brilliant, thanks :)
Well so far the 32bit version is running fine for me. I did have reproducable segfaults when making a map with a large number of cities with large numbers of inhabitants but I want to a) use the git-version and b) track it down a bit more precisely before I report that.
As for performance, I agree on the 4GB issue, but doesnt x64 also have substantially more registers (and bigger ones at that) whilst disposing of at least a little bit of the ancient cruft that we carry around since the 8086? I'm no expert with C/C++ but I thought the compiler takes care of this more or less automatically so my gut feeling would be that making simutrans 64bit-safe would be good. Also there are some 64bit arches that do not have hardware support for running 32bit code. And whilst 4GB for a single program may seem obscene now, I think in a few years we'll look at the issue differently. So my vote, realising that I don't get one, goes to continuing the effort for a 64bit version :)
Stand by for the backtrace, this game is just so addictive..
Steffen,
ahh, yes, it would be good to make it 64-bit compatible for portability, if nothing else. It's not my main priority at present (and it's hard for me to test, as I only have a 32-bit machine), but if you (or anyone else) can find the problem and propose a sensible solution, I'll happily fix it :-)
We definitely want simutrans to be 64-bit clean. The thing is, I have no idea what part isn't 64-bit-clean. Nothing is jumping out at me, and I don't have a 64-bit machine to test on. There are all kinds of sloppy integer conversions throughout the code, and sometime I'm going to go through and clean up as many as I possibly can, but that's going to take a long time, so if we can find the actual cause of the problem, that would be best.
Ok it seems the self-compiled version runs just fine. I confirmed with file that its a 64 bit version.
Can you confirm that the 2010-04-11-fd32563 is the same as the latest git from http://github.com/jamespetts/simutrans-experimental (last date of change is April10, the commit ID conspicuosly starts with fd32563)? I'm really confused.. does anyone have any ideas?
Are you compiling from the -devel branch or the -master branch?
Good question, I never told git specifically so I would ****ume its the master branch. Here's the command I used: git clone http://github.com/jamespetts/simutrans-experimental.git
Which branch is used to create the automatic builds?
The Master branch is used to create the automatic builds - if you are building the master branch, you will get Simutrans-Experimental 7.3.
Then I'm really confused, how can it be that the auto-build segfaults but my own build works? What exact settings (and GCC and library versions) are used to make the autobuild, I could try rebuilding with those to try and reduce the number of possible causes.
This is indeed odd. Perhaps Ansgar can ****ist...?
The autobuild is done in Debian Lenny which has these libraries:
libsdl1.2-dev 1.2.13-2
libsdl-mixer1.2-dev 1.2.8-4
zlib1g-dev 1:1.2.3.3.dfsg-12
libpng12-dev 1.2.27-2+lenny2
libbz2-dev 1.0.5-1
gcc 4:4.3.2-2
And this config.simutrans-exp (http://www.43-1.org/~simutrans/simutrans-exp/config.simutrans-exp).
i tried to compile devel on a x86_64
pulled it that way:
$ git pull <url> devel
$gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
i'm using a config.default similar to ansgars (thanks your post helped me a lot)
same flags, no ccache,
it failed at dataobj/einstellungen.o
dataobj/einstellungen.cc:979: error: cast from ‘const char*’ to ‘uint32’ loses precision
changed that to uint64, that compiles
next error at
===> CXX simgraph16.cc
simgraph16.cc: In function ‘void rezoom_img(image_id)’:
simgraph16.cc:1290: warning: comparison between signed and unsigned integer expressions
simgraph16.cc: ****embler messages:
simgraph16.cc:2370: Error: suffix or operands invalid for `jmp'
make: *** [simgraph16.o] Error 1
(continued)that's the same point where main branch didn't compile for me last time i tried.
it looks quite like asembler to me, i have quite some problems to read c++ already, but here i'm completely lost.
google is my friend, someone else found that too:
http://bugs.gentoo.org/show_bug.cgi?id=291285
didn't really help me though...
seems like prissi can:
http://archive.forum.simutrans.com/topic/06875.0/index.html
and i overlooked it in ansgars config.default
ifneq ($(shell dpkg-architecture -qDEB_BUILD_ARCH),i386)
FLAGS += -DUSE_C
endif
i read if equal, and ignored fillilly ignored it.
so now, compiling with += -DUSE_C
which works
minor interruption, to get libbz2-dev
linked without error, testing now
it
runsno errors
that's enough for me today, it's late and i'll go chopping* some heads off in mount and blade,
please let me know what further testing of the binary you need.
here's the binary and the log file for download:
http://dl.dropbox.com/u/1876190/simexp-64-8.0-devel
http://dl.dropbox.com/u/1876190/simexp-64-8.0-devel.log
*i should better use the p****ive here, get my head chopped off, still relaxing.
updatecouldn't resist trying a bit and caused this:
Program received signal SIGSEGV, Segmentation fault.
0x000000000059e72c in vector_tpl<koord>::remove_at (this=0x7fffffffb910,
pos=0, preserve_order=false)
at boden/wege/../../dataobj/../tpl/vector_tpl.h:213
213 data[pos] = data[count-1];
error is reproduceable, just loaded my savegame (the one i uploaded previously), and waited.
gdb output:
http://dl.dropbox.com/u/1876190/gdb.txt
another updatemost of the problems here have been discussed in detail in this thread, only a few days ago:
http://forum.simutrans.com/index.php?topic=5066.msg49722#msg49722
perhaps ccache causes the problem?
this bug "ccache sometimes returns 32bit objects to a 64bit build" is rather old, but still worth to check ccache?
http://bugs.gentoo.org/show_bug.cgi?id=196243
Good idea but I don't have ccache installed so that can't be it :(
I'm not sure if you understood me right, my own build appears to work fine (it loads with the experimental britain pak, it loads my save game originally made with the i386 autobuild, I tried running it a couple of minutes) but the autobuild segfaults before it even asks me which pak I want to use.
So we can exclude my run-environment, the actual code and the pak as sources of the problem with the autobuild. (I refer to the problem that it doesnt even launch, the problem in vector-tpl.h
Now of the top of my head I can think of these remaining suspects:
- lack of DEBUG in the 04-11 autobuild causes the problem
- some other difference between sdog's and mine vs. the autobuild config causes it
- the versions of GCC/libraries used in autobuild cause it
- the versions of GCC/libs used during compile is incompatible with the versions used by sdog and me.
Not knowing much of C the last two are my favourites right now, so I'm emerging the old libraries.
sdog: I think it might be worth opening a new thread for the segfault you discovered during running your own build. I think it's save to ****ume that that is a different fault then whatever is causing the autobuild to segfault on startup.
No, no, the autobuild might have ccache installed, and that may be causing breakage there.
Ah ok, yes that could be a cause. But the bug linked appears to be specific to Gentoo's handling of having two-in-one platforms, and was marked fixed (by update to an ecl****, which are again Gentoo-specific) more than a year ago. But I guess until proven otherwise we should consider that an option.
Ansgar: Is ccache active on the autobuild system?
Is this bug still happening with -DUSE_C? If so please rerun it with the latest devel and give me a new line number, since the old one seems a bit out of date. This is the sort of thing I should be able to solve.
Oh hell, I found a logic bug in vector_tpl in experimental -- nasty off-by-one error.
Don't roll your own container cl****es, folks!Patch is on the jp-devel branch of my git repo (git://github.com/neroden/simutrans)
If that doesn't fix it, or if the bug is in standard (which doesn't contain the off-by-one error) then I need two pieces of information from gdb: the output of 'print count' and 'bt'.
it still happens with -DUSE_C.
dataobj/einstellungen.cc:982: error: cast from ‘const char*’ to ‘uint32’ loses precision
I pulled vector_tpl from your branch, changed eintellungen.cc:982 to unit64 and compiled it.
i'll let it run on fast time, and go a way a while.
It didn't crash. You must have fixed it.
The game froze however, after playing around in the depot a bit. No input possible, date stuck at 1 February, news ticker in bottom panel continued to run.
killed it with ^c
^C
Program received signal SIGINT, Interrupt.
0x000000000066b8e1 in colorpixcopy (dest=0x891c75e, src=0x11a8754, end=0x11a877e) at simgraph16.cc:1932
1932 while (src < end) {
(gdb) bt
#0 0x000000000066b8e1 in colorpixcopy (dest=0x891c75e, src=0x11a8754, end=0x11a877e) at simgraph16.cc:1932
#1 0x000000000066c3f7 in display_color_img_aux (sp=0x11a8730, x=389, y=471, h=19) at simgraph16.cc:2526
#2 0x000000000066cb58 in display_base_img (n=1638, xp=341, yp=410, player_nr=0 '\000', daynight=0, dirty=1)
at simgraph16.cc:2642
#3 0x00000000004bfa79 in gui_image_list_t::zeichnen (this=0x7899bf8, parent_pos=...)
at gui/components/gui_image_list.cc:99
#4 0x00000000004fe3a9 in gui_container_t::zeichnen (this=0x789a918, offset=...) at gui/gui_container.cc:123
#5 0x00000000004c4e2f in gui_scrollpane_t::zeichnen (this=0x7899d18, pos=...)
at gui/components/gui_scrollpane.cc:148
#6 0x00000000004c63df in gui_tab_panel_t::zeichnen (this=0x78993a8, parent_pos=...)
at gui/components/gui_tab_panel.cc:121
#7 0x00000000004fe3a9 in gui_container_t::zeichnen (this=0x78991f0, offset=...) at gui/gui_container.cc:123
#8 0x00000000004b6227 in gui_convoy_****embler_t::zeichnen (this=0x78991f0, parent_pos=...)
at gui/components/gui_convoy_****embler.cc:518
#9 0x00000000004fe3a9 in gui_container_t::zeichnen (this=0x7898948, offset=...) at gui/gui_container.cc:123
#10 0x00000000004ffed5 in gui_frame_t::zeichnen (this=0x7898940, pos=..., gr=...) at gui/gui_frame.cc:166
#11 0x00000000004e5d7a in depot_frame_t::zeichnen (this=0x7898940, pos=..., groesse=...) at gui/depot_frame.cc:613
#12 0x0000000000616011 in display_win (win=0) at simwin.cc:647
#13 0x00000000006160b3 in display_all_win () at simwin.cc:670
#14 0x0000000000617843 in win_display_flush (konto=223383.38) at simwin.cc:1023
#15 0x00000000005dd956 in intr_refresh_display (dirty=false) at simintr.cc:76
#16 0x00000000006278b0 in karte_t::sync_step (this=0xeb1960, delta_t=133, sync=false, display=true)
at simworld.cc:2792
#17 0x00000000005dda49 in interrupt_check (caller_info=0x6a0c81 "simfab 636") at simintr.cc:101
#18 0x00000000005c3fa3 in fabrik_t::step (this=0x7355440, delta_t=100) at simfab.cc:1068
#19 0x0000000000629d14 in karte_t::step (this=0xeb1960) at simworld.cc:3419
#20 0x0000000000633e46 in karte_t::interactive (this=0xeb1960, quit_month=2147483647) at simworld.cc:5725
---Type <return> to continue, or q <return> to quit---
#21 0x00000000005e6633 in simu_main (argc=1, argv=0x7fffffffe2e8) at simmain.cc:1075
#22 0x00000000006713e3 in main (argc=1, argv=0x7fffffffe2e8) at simsys_s.cc:748
ccache is used for the auto builds, according to the config.default Ansgar posted. Perhaps it's worth a try for him to compile without ccache?
Yes, but I configured ccache to use a different cache directory for amd64 and i386 (IIRC it didn't work at all before). I just cleaned the cache anyway.
This was a silly coding error in experimental. I just fixed it on my jp-devel branch. I'm surprised it worked on 32-bit machines, as it shouldn't have!
The lockup is going to be harder to debug; the place you stopped it is working just fine (as proved by the fact that the ticker keeps running, in a single-threaded program).
Knightly,
I'm probably the silly coder in this case. I'd be very grateful if you could release your updated version; although performance is slower, I find it acceptable enough, and I have made it optional in Experimental in any case for those who don't like the extra time that it takes to generate a map.
Ok it seems I found what causes the segfault on startup with the autobuild. It's either OPTIMISE = 1 or -DNEW_PATHING.
Here's the backtrace for a start, I'll try which of these the final culprit is now and then try it with devel. I realise now that I should've tried devel first, didn't take into account that these flags can influence which code is compiled *sigh*
Program received signal SIGSEGV, Segmentation fault.
0x000000000062997b in display_fb_internal (xp=<value optimized out>, yp=<value optimized out>, w=<value optimized out>, h=88,
color=<value optimized out>, dirty=<value optimized out>, cL=<value optimized out>, cR=<value optimized out>, cT=0,
cB=<value optimized out>) at simgraph16.cc:2959
2959 *lp++ = longcolval;
(gdb) bt
#0 0x000000000062997b in display_fb_internal (xp=<value optimized out>, yp=<value optimized out>, w=<value optimized out>, h=88,
color=<value optimized out>, dirty=<value optimized out>, cL=<value optimized out>, cR=<value optimized out>, cT=0,
cB=<value optimized out>) at simgraph16.cc:2959
#1 0x0000000000629aee in display_fillbox_wh (xp=44, yp=176, w=20338, h=1, color=<value optimized out>, dirty=-135966870)
at simgraph16.cc:2990
#2 0x00000000004d6c1b in gui_frame_t::zeichnen (this=<value optimized out>, pos=..., gr=<value optimized out>) at gui/gui_frame.cc:155
#3 0x00000000004fe59c in pakselector_t::zeichnen (this=0x2c, p=..., gr=<value optimized out>) at gui/pakselector.cc:64
#4 0x00000000005aa74e in ask_objfilename (argc=1, argv=<value optimized out>) at simmain.cc:247
#5 simu_main (argc=1, argv=<value optimized out>) at simmain.cc:644
#6 0x000000000062dc67 in main (argc=1, argv=0x7fffffffdac8) at simsys_s.cc:743
(gdb) quit
Steffen,
the "NEW_PATHING" preprocessor directive has long been deprecated, so that can be eliminated as a possible cause. References to it should be removed. I should also note that none of those parts of the code, so far as I am aware, have had any modifications for Experimental.
yes you're right, I found that activating NEW_PATHING lets the program run fine, whilst OPTIMISE breaks it. Now I looked in the Makefile and the only conclusion I can draw is that it's a toolchain bug. If anyone agrees I'll head over to GCC and post a bugreport.
ansgar: in the meantime, can you deactivate OPTIMISE for the amd64 autobuild?
i thought i compiled it with optimisation. since i'm not on the same machine now, i can't check wich level. -O3 would be the most likely however.
What version of GCC are you using?
More likely is a bug in Simutrans. An optimizer may make certain ****umptions about the code, but a program could be written in a way that these are not true. In that case it will likely crash or give wrong results.
Sure.
James, I think I'm the one who made the 'silly coding error' comment, and you've already fixed it by pulling from my jp-devel branch. Isn't git wonderful? (Says the new convert who wouldn't use it two months ago!)
One example which would work differently in 64-bit and 32-bit arithmetic is bit-twiddling with & and | -- the bitmasks might be right for 32-bit, wrong for 64-bit. There are a bunch of other things like this.
This reminds me why I use Python for my own programs lol.
In any case I found another reproducable segfault, but I'll post it as a new thread since it appears to be a different issue.
And yes, git is brilliant :)
python... to-morrow i have to look at my fortran 77 code all day long.
you let me dream of tropical paradises with pythons and girls wearing perls and rubies, and certainly weren't named ada.
i'm pretty thankfull james decided to use git, having to learn svn is something i gladly p**** for more joyfull things. well, about twenty million less joyfull things spring to my mind though. all are related to horrible deaths, root canal treatments, north-american coffee or having smalltalk with ada.
since i don't want to cp the file after building to my simutrans dir, i just put a symlink on the build in the development directory. it doesn't work. simutrans did not look in my pwd for the pakfile, and exited when i couldn't find one.
Sdog,
I don't think that this is a Simutrans-Experimental specific issue.
Simutrans uses the programm directory, where you copied it. If it should use the current directory instead, it must be called with "-use_workdir", see the readme.
thanks prissi, i already expected i made a mistake when compiling
@james, yes it isn't but it is to unimportant to warrant a new thread.
Ok well I definitely can't reproduce that segfault I was getting anymore, either an update or my world recompile fixed it :)