Erlang/OTP Forums

Author Message

<  Advanced Erlang/OTP  ~  How to easily/quickly send contents of a file to a socket?

montsamu
Posted: Wed Apr 04, 2007 5:56 pm Reply with quote
Joined: 04 Apr 2007 Posts: 2
{ok,Bin} = file:read_file(Path), ok = gen_tcp:send(Sock,Bin).

This is how Yaws does it (among others) but this seems hardly the best way -- it reads the entire file into user memory, then writes the entire file to the socket.

The issues with a loop-based approach of 'read N bytes from the file, write N bytes to the socket, repeat until done' are briefly discussed here:

http://www.erlang.org/ml-archive/erlang-questions/200311/msg00145.html

The sendfile patch in the same mail (not part of Erlang) looked very interesting. This was late 2003, is there not a good, quick, easy way to say 'send the contents of file f to socket s'?
View user's profile Send private message
Mazen
Posted: Thu Apr 05, 2007 4:55 pm Reply with quote
User Joined: 20 Jul 2006 Posts: 164 Location: London
montsamu wrote:
{ok,Bin} = file:read_file(Path), ok = gen_tcp:send(Sock,Bin).

This is how Yaws does it (among others) but this seems hardly the best way -- it reads the entire file into user memory, then writes the entire file to the socket.

The issues with a loop-based approach of 'read N bytes from the file, write N bytes to the socket, repeat until done' are briefly discussed here:

http://www.erlang.org/ml-archive/erlang-questions/200311/msg00145.html

The sendfile patch in the same mail (not part of Erlang) looked very interesting. This was late 2003, is there not a good, quick, easy way to say 'send the contents of file f to socket s'?


I believe it depends a litle bit on what you want to use it for, Of what I know the answer is "No". There is no way to keep Erlang from not copying what ever bytes you read into userspace memory and then back (as mentioned in the discussion). I don't think on the other hand that this optimization is needed unless you are dealing with streaming live data (in which case the "file" is a device and you should use a port driver for and not pure erlang). I can't back this up with hard facts though, but I think it is a fair assumption.

Essentially there is only 1 way to do a transfer, Period. That is: Send the file piece by piece. The question is: How much can I/should I send at once, and where do I buffer the rest of the bits that can't be sent straight away. I can see your concern about reading in the whole file into memory and if a fairly easy optimization is needed then you should examine what the best number of bytes are that you send to your socket I.e. how big is yourt chunk. In the yaws case the file is buffered where ever the socket wants it to be buffered, and the VM memory space probably isn't affected (only for a short time) since the buffer is taking place outside its domain (Correct me if I'm wrong anyone, not 100% on this, but at least 98% sure Smile). This should make it a faster way to send out something then reading it piece by piece since you don't have 2 loops reading chunks to send of.

/M
View user's profile Send private message
montsamu
Posted: Fri Apr 06, 2007 1:53 pm Reply with quote
Joined: 04 Apr 2007 Posts: 2
Hm... good answer and thoughts, thanks very much. What do you think about using Erlang messages for a 2-process approach instead of 2 loops in the same process?

Instead of:

Process Single:
read chunk of file, write chunk to socket, loop

Do:

Process A:
spawn process B
read file and send data to process B via message, loop
when finished send an 'eof' message, end

Process B:
receive message and write to socket, loop
when receive 'eof' message, end

I guess main concern would be that the mailbox fills up, or VM space is filled up if you read much too fast vs. send (which you likely will), but I've never accomplished that (mailbox fill) even with runaway programs. Since it is only one process (A) doing the sending, messages are (IIRC) guaranteed to be received in order -- but I might be wrong about that as well (particularly on an SMP-enabled system)? Fix to that is I suppose just adding a chunk number N argument to process B's loop state and to the message.

Another thought is: is there a foreign function interface to Erlang? That would make it a lot easier (and more portable) to add sendfile (instead of patching the Erlang source).

Heh. Maybe the Yaws guys have it right in the end. Just read the whole file, send it, and be done with it.
View user's profile Send private message
Mazen
Posted: Mon Apr 16, 2007 8:15 am Reply with quote
User Joined: 20 Jul 2006 Posts: 164 Location: London
I would 100% go for 1 process reading chuncks from a file and sending them to the socket. Having two process like you suggest would just make it over-complicated for a problem that is in best cases barely noticeable.

I'm not sure what you mean by "foreign function interface", but you could do a linked in driver or a port driver for sending the data. You do that with C or C++ and you can find information on how to do that on the wiki and in the documentation. However... I strongly believe that it is overdoing it Smile

If you want to use an already built tool you can use the os:cmd() function and invoke it from erlang. Have a look in the documentation.

Best way to really know which is fastest between chunk by chunk or through reading the whole binary and pushing it to the socket, is to try it Smile maybe the yaws way is the best way... but I doubt it Wink (atleast for large files like +15Mb)

Cheers Smile
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum