| Author |
Message |
< Erlang ~ Unable to get the HTML in a list from a website |
| Volume |
Posted: Thu Jan 01, 2009 2:22 am |
|
|
|
Joined: 01 Jan 2009
Posts: 5
|
I followed an example in the Programming Erlang (pg. 240) on how to get the HTML in www.google.com using HTTP 1.0. I put my own twist on it trying to get the same thing from CNN.
Here's my code:
Code: -module(webRetrieval).
-compile(export_all).
%-define(url_to_get, "www.cnn.com").
-define(url_to_get, "www.cnn.com/WORLD").
get_site() ->
{ok, Socket} = gen_tcp:connect(?url_to_get, 80, [binary, {packet, 0}]),
ok = gen_tcp:send(Socket, "GET / HTTP/1.0\r\n\r\n"),
receive_data(Socket, []).
receive_data(Socket, SoFar) ->
receive
{tcp, Socket, Bin} ->
receive_data(Socket, [Bin | SoFar]);
{tcp_closed, Socket} ->
list_to_binary(lists:reverse(SoFar))
end.
When I use the url ending in ".com", it works just fine. However, if I add on a /WORLD/ (in this case), I get the below error:
Code: 49> webRetrieval:get_site().
=ERROR REPORT==== 31-Dec-2008::20:36:17 ===
Error in process <0.183.0> with exit value: {{badmatch,{error,nxdomain}},[{webRetrieval,get_site,0},{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}]}
** exited: {{badmatch,{error,nxdomain}},
[{webRetrieval,get_site,0},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_loop,3}]} **
Why is this happening? I put it through the built in debugger and got the below error:
Code: 58> webRetrieval:get_site().
=ERROR REPORT==== 31-Dec-2008::21:18:04 ===
Error in process <0.203.0> with exit value: {{badmatch,{error,nxdomain}},[{webRetrieval,get_site,[]}]}
** exited: {{badmatch,{error,nxdomain}},[{webRetrieval,get_site,[]}]} **
I looked up what connect does for gen_tcp. It takes in an address, which can be a string. Thoughts? Have I misused a particular function? |
|
|
| Back to top |
|
| Mazen |
Posted: Thu Jan 01, 2009 1:10 pm |
|
|
|
User
Joined: 20 Jul 2006
Posts: 164
Location: London
|
when you connect to something you can only connect to an IP address... the TCP layer has no idea of what the "/WORLD" actually means... that is why you should try to connect to www.cnn.com (which results in an IP) and then do a get on the WORLD part...
"GET /WORLD/ HTTP/1.x" (choose x )
good luck |
|
|
| Back to top |
|
| Volume |
Posted: Thu Jan 01, 2009 4:18 pm |
|
|
|
Joined: 01 Jan 2009
Posts: 5
|
| I see. Thank you. I'm not shy to admit that my understanding of online protocols is lacking. I'll make sure to read up on this. |
|
|
| Back to top |
|
| Volume |
Posted: Sat Jan 03, 2009 6:17 am |
|
|
|
Joined: 01 Jan 2009
Posts: 5
|
Mazen wrote: "GET /WORLD/ HTTP/1.x" (choose x  )
After doing some more testing, I have to ask. Why 'x'? When I put x in place of 0, I got a Error 400 Bad Request. |
|
|
| Back to top |
|
| Mazen |
Posted: Sat Jan 03, 2009 1:27 pm |
|
|
|
User
Joined: 20 Jul 2006
Posts: 164
Location: London
|
Volume wrote: Mazen wrote: "GET /WORLD/ HTTP/1.x" (choose x  )
After doing some more testing, I have to ask. Why 'x'? When I put x in place of 0, I got a Error 400 Bad Request.
x was intended for the different HTTP protocols.. namely 1.0 or 1.1
I did some testing myself.. giving:
Code:
GET /WORLD HTTP/1.0\r\n\r\n
gave:
Code:
HTTP/1.1 301 Moved Permanently
Date: Sat, 03 Jan 2009 13:21:50 GMT
Server: Apache
Location: http://cnns1fill8/WORLD/
Cache-Control: max-age=60, private
Expires: Sat, 03 Jan 2009 13:22:50 GMT
Content-Length: 292
Content-Type: text/html; charset=iso-8859-1
Vary: User-Agent,Accept-Encoding
Connection: close
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved P
ermanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has mov
ed <a href="http://cnns1fill8/WORLD/">here</a>.</p>
<hr>
<address>Apache Server at
cnns1fill8 Port 80</address>
</body></html>
In other words it works for me but I get a 301 which is fine.
do _not_ put the literal x... it is a variable which should be 0 or 1
Sorry for the confusion =) |
|
|
| Back to top |
|
|
|
All times are GMT
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|