Erlang/OTP Forums

Author Message

<  Advanced Erlang/OTP  ~  Mnesia index foldl()

datacompboy
Posted: Sat May 26, 2007 5:47 am Reply with quote
User Joined: 21 Sep 2006 Posts: 69 Location: Novosibirsk, Russia
Is there any way to foldl() on index value?
I have in database some log, indexed on date (records have date and time fields, index on date only).
I need to select all on date, filter result on some other field, and calculate sum of 3rd field. f.e., standard statistic are:
stat(Date, Types) ->
TypeFilter = fun(#stats{type=X})->ordsets:is_element(X, Types) end,
CalcStat = fun(#stats{count=X}, S)->S+X end,
Log = mnesia:index_read(statlog, Date, #stats.date),
TypeLog = lists:filter(TypeFilter, Log),
Stat = lists:foldl(CalcStat, 0, TypeLog).
So it load into memory all log (up to 10k records) while they needs only for filtering and sum.

So, is there way to foldl() on index, without reading whole key records?

_________________
--- suicide proc near\n call death\n suicide endp
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number
francesco
Posted: Tue May 29, 2007 2:07 pm Reply with quote
User Joined: 07 Jul 2006 Posts: 249 Location: London
To avoid disrupting large ammounts of memory, I use dirty_first and dirty_next. To speed things up a little, (Not a pretty solution), I go straight to the ets operations on the table. If other processes are doing desructive operations to the table, use safe_fixtable to avoid a badarg in case of rehashing. Make sure you test this well while stressing your systems.

Otherwise, mnesia:foldl works in a similar way. The clean way of using it would be to do the filtering yourself in the Fun you pass to it, and not generate the TypeLog list as you do now. At the expense of computational power, you would disrupt less memory.

I hope this helps,
Francesco
View user's profile Send private message Visit poster's website
datacompboy
Posted: Tue May 29, 2007 3:02 pm Reply with quote
User Joined: 21 Sep 2006 Posts: 69 Location: Novosibirsk, Russia
francesco wrote:
Otherwise, mnesia:foldl works in a similar way.


foldl() don't use index, it foldl() over whole table, while table are quite large, so foldl() works slower, than index_read + lists:foldl
:(

i have found
http://www.erlang.org/ml-archive/erlang-questions/200502/msg00251.html
it dated 2005th year, but it still not merged into main mnesia tree, and I'm not sure, is it possible to integrate into latest mnesia...

--
wbr, Anton

_________________
--- suicide proc near\n call death\n suicide endp
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number
francesco
Posted: Tue May 29, 2007 5:31 pm Reply with quote
User Joined: 07 Jul 2006 Posts: 249 Location: London
Even if you did foldl on the index table, you would have to traverse it all, as it is a bag and not an ordered set. So you would still be displacing as much memory as you are now... (Even Ulf, in his post, states that he will not attempt to fix the order in a bag). So even if close, I am not sure it would fix your problem.

Depending on the size of the table, I would be careful with large memory displacements you have little control over. Running out of memory is the most common reason for a VM crash in live systems. So even if first/next is slower, it is also safer.

Francesco
View user's profile Send private message Visit poster's website
datacompboy
Posted: Wed May 30, 2007 10:43 am Reply with quote
User Joined: 21 Sep 2006 Posts: 69 Location: Novosibirsk, Russia
francesco wrote:
Even if you did foldl on the index table, you would have to traverse it all, as it is a bag and not an ordered set.

so, mnesia is not usable to build fast statistics, or I should keep separate table with pre-calculated ones ?
hmm.. may be I should do everyday export of data into any external sql database...

_________________
--- suicide proc near\n call death\n suicide endp
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number
francesco
Posted: Wed May 30, 2007 12:53 pm Reply with quote
User Joined: 07 Jul 2006 Posts: 249 Location: London
For statistics, we use mnesia:dirty_update_counter (Or similar in ets). It is an atomic operation. From there, we calculate the delta in various time intervals and store the data to plot graphs (Possibly using external tools such as HP Openview, or in integrated O&Ms, using gnuplot of flash plugins). I am however unsure if that helps you in what you are trying to achieve.

Francesco
View user's profile Send private message Visit poster's website

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum