{"id":24,"date":"2011-02-10T21:19:18","date_gmt":"2011-02-10T21:19:18","guid":{"rendered":"http:\/\/ulrichard.is-a-geek.net\/?p=24"},"modified":"2011-02-10T21:19:18","modified_gmt":"2011-02-10T21:19:18","slug":"convert-news-web-page-into-rss-feed","status":"publish","type":"post","link":"https:\/\/ulrichard.ch\/blog\/?p=24","title":{"rendered":"Convert news web page into rss feed"},"content":{"rendered":"<p>At the company I work (<a title=\"BORM\" href=\"http:\/\/www.bormgruppe.ch\">BORM<\/a>), we have an internal web page with the company news. We&#8217;re required to log in and check every once in a while.<\/p>\n<p>Now to get notified automatically, wehen some news are to be discovered, I hacked together a PHP script that logs in, downloads and parses the news. It then serves an rss xml file to my feed reader.<\/p>\n<p>In the past I didn&#8217;t do a lot of PHP scripting. Only editing some bits and pieces. So this is my biggest endeavor to PHP.<\/p>\n<p>Tested with <a href=\"http:\/\/liferea.sourceforge.net\">liferea<\/a> and <a href=\"http:\/\/www.beyondpod.mobi\/\">BeyondPod<\/a>.<\/p>\n<p>I won&#8217;t tell you where the script can be accessed, as the company news requires login. But the script itself is here:<\/p>\n<p><!--more--><\/p>\n<p>&lt;?php<br \/>\nheader(&#8216;Content-type: application\/xml&#8217;);<\/p>\n<p>echo &#8220;&lt;rss version=&#8221;2.0&#8243;&gt;n&#8221;;<br \/>\necho &#8220;t&lt;channel&gt;n&#8221;;<br \/>\necho &#8220;tt&lt;title&gt;BormIntern&lt;\/title&gt;n&#8221;;<br \/>\necho &#8220;tt&lt;description&gt;News aus dem BORM Point&lt;\/description&gt;n&#8221;;<br \/>\necho &#8220;tt&lt;language&gt;de&lt;\/language&gt;n&#8221;;<br \/>\necho &#8220;tt&lt;link&gt;http:\/\/secret.news.borm.ch&lt;\/link&gt;n&#8221;;<br \/>\necho &#8220;tt&lt;lastBuildDate&gt;Thu, 10. Feb 2011 00:00:00 GMT&lt;\/lastBuildDate&gt;n&#8221;;<\/p>\n<p>$ckfile = tempnam (&#8220;\/tmp&#8221;, &#8220;CURLCOOKIE&#8221;);<\/p>\n<p>$ch = curl_init();<br \/>\ncurl_setopt($ch, CURLOPT_URL, &#8220;http:\/\/secret.news.borm.ch\/login.php&#8221;);<br \/>\ncurl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);<br \/>\ncurl_setopt($ch, CURLOPT_USERAGENT, &#8220;php script to generate an rss feed for the BORM news&#8221;);<br \/>\ncurl_setopt($ch, CURLOPT_RETURNTRANSFER, true);<br \/>\ncurl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);<br \/>\n$output = curl_exec($ch);<\/p>\n<p>curl_setopt($ch, CURLOPT_POST, true);<br \/>\ncurl_setopt($ch, CURLOPT_POSTFIELDS, &#8220;Schritt=2&amp;Seite=index.asp&amp;sel_Adr=150&amp;Passwort=mysecretpassword&amp;submit=Login&#8221;);<br \/>\ncurl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);<br \/>\ncurl_setopt($ch, CURLOPT_REFERER, &#8220;http:\/\/secret.news.borm.ch\/Login.php?Seite=index.php&#8221;);<br \/>\ncurl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);<br \/>\n$output = curl_exec($ch);<\/p>\n<p>curl_close($ch);<\/p>\n<p>$dom = new domDocument;<br \/>\nif(!$dom-&gt;loadHTML($output))<br \/>\necho &#8220;failed to parse the html&#8221;;<br \/>\n$dom-&gt;preserveWhiteSpace=false;<br \/>\n$tables = $dom-&gt;getElementsByTagName(&#8216;table&#8217;);<br \/>\n$tableid = $tables-&gt;length &#8211; 2;<br \/>\n$rows = $tables-&gt;item($tableid)-&gt;getElementsByTagName(&#8216;tr&#8217;);<\/p>\n<p>$firstitem = true;<br \/>\nforeach($rows as $row)<br \/>\n{<br \/>\n$cols = $row-&gt;getElementsByTagName(&#8216;td&#8217;);<\/p>\n<p>if($cols-&gt;length &gt; 0 &amp;&amp; strlen($cols-&gt;item(0)-&gt;nodeValue) &gt; 0)<br \/>\n{<br \/>\n$msgtext = $cols-&gt;item(0)-&gt;nodeValue;<\/p>\n<p>if($cols-&gt;item(0)-&gt;getElementsByTagName(&#8216;strong&#8217;)-&gt;length &gt; 0)<br \/>\n{<br \/>\nif(!$firstitem)<br \/>\n{<br \/>\necho &#8220;ttt&lt;\/description&gt;n&#8221;;<br \/>\necho &#8220;tt&lt;\/item&gt;n&#8221;;<br \/>\n}<br \/>\n$firstitem = false;<\/p>\n<p>echo &#8220;tt&lt;item&gt;n&#8221;;<br \/>\necho &#8220;ttt&lt;title&gt;&#8221;;<br \/>\necho $msgtext;<br \/>\necho &#8220;&lt;\/title&gt;n&#8221;;<br \/>\necho &#8220;ttt&lt;link&gt;http:\/\/secret.news.borm.ch&lt;\/link&gt;n&#8221;;<br \/>\necho &#8220;ttt&lt;pubDate&gt;Wed, 9. Feb 2011 00:00:00 GMT&lt;\/pubDate&gt;n&#8221;;<br \/>\necho &#8220;ttt&lt;guid&gt;&#8221;;<br \/>\necho md5($msgtext);<br \/>\necho &#8220;&lt;\/guid&gt;n&#8221;;<br \/>\necho &#8220;ttt&lt;description&gt;&#8221;;<br \/>\n}<br \/>\nelse if($cols-&gt;length &gt; 1 &amp;&amp; $cols-&gt;item(1)-&gt;getElementsByTagName(&#8216;a&#8217;)-&gt;length)<br \/>\n{<br \/>\n$link = $cols-&gt;item(1)-&gt;getElementsByTagName(&#8216;a&#8217;)-&gt;item(0);<br \/>\necho &#8220;&lt;a href=&#8221;http:\/\/secret.news.borm.ch\/&#8221; . $link-&gt;getAttribute(&#8220;href&#8221;) . &#8220;&#8221;&gt;&#8221; . $link-&gt;childNodes-&gt;item(0)-&gt;nodeValue . &#8220;&lt;\/a&gt;&lt;br \/&gt;n&#8221;;<br \/>\n}<br \/>\nelse<br \/>\n{<br \/>\necho $msgtext;<br \/>\necho &#8220;&lt;br \/&gt;n&#8221;;<br \/>\n}<br \/>\n}<br \/>\n}<br \/>\nif(!$firstitem)<br \/>\n{<br \/>\necho &#8220;ttt&lt;\/description&gt;n&#8221;;<br \/>\necho &#8220;tt&lt;\/item&gt;n&#8221;;<br \/>\n}<\/p>\n<p>echo &#8220;t&lt;\/channel&gt;n&#8221;;<br \/>\necho &#8220;&lt;\/rss&gt;n&#8221;;<\/p>\n<p>?&gt;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At the company I work (BORM), we have an internal web page with the company news. We&#8217;re required to log in and check every once in a while. Now to get notified automatically, wehen some news are to be discovered, I hacked together a PHP script that logs in, downloads and parses the news. It [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,1],"tags":[172],"class_list":["post-24","post","type-post","status-publish","format-standard","hentry","category-software","category-uncategorized","tag-php"],"_links":{"self":[{"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24"}],"version-history":[{"count":0,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24\/revisions"}],"wp:attachment":[{"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}