Skip to main content
Snippets

Readfile and File Get Content PHP Function

By March 25, 2016No Comments

This post is about the readfile(); and file_get_contents(); functions in PHP. Jump to file_get_contents

Let’s say you want to display an external webpage at your own domain name, or even a domain that isn’t yours, the readfile(); function can do it. Here’s an example:

readfile('http://php.net/manual/en/function.readfile.php'); 

http://php.net/manual/en/function.readfile.php

So what if you want to customize the style of the page output?

That’s not a problem either, just reference your own stylesheet in the file that reads the external page.

<link href="http://jarodthornton.com/iemajen/wordpress/readfile/style.css" rel="stylesheet">;

In this example I’ve replaced the external page logo with my own.

a.brand {
    background: url(https://emajen.com/secure/i/site-logo-dark.png) no-repeat;
    background-size: 135px;
    margin: 10px 0px 0px 10px;
    width: 100px;
}
.brand img {
    display: none;
}
[su_divider top=”no”]

Why readfile?

The readfile function dumps the entire contents of a file or Web page into the output of the page where your PHP is written. It does have it’s drawbacks and from what I’ve read cannot parse the data output to be used in other functions.

I’ve used this approach a few times for client projects.

In one project we wanted to take four WordPress sites and consolidate them to one. Each site has different domain names and branding so we needed to have control over the style. I simply created a page in WordPress for the second site, in this case a single landing page, then mapped the second domain to the directory where I have the readfile(); output. Now that domain / website can be updated from the parent WordPress site and display at the unique domain name.
[su_divider top=”no”]

Can the content be changed?

You can include your own HTML. The drawback to this is that it will output your HTML outside the pages actual document tags. Here’s an example:

<link href="http://jarodthornton.com/iemajen/wordpress/readfile/style.css" rel="stylesheet">

<div class="header">
Hello World
</div>

readfile('http://php.net/manual/en/function.readfile.php'); 
[su_divider top=”no”] I also updated the CSS to style my HTML:

.header {
    position: absolute;
    top: 0px;
    left: 0px;
    z-index: 999;
    margin: 10px 0px 0px 10px;
}
nav#head-nav {
    margin-top: 40px;
}
a.brand {
    background: url(https://emajen.com/secure/i/site-logo-dark.png) no-repeat;
    background-size: 135px;
    margin: 10px 0px 0px 10px;
    width: 100px;
}
.brand img {
    display: none;
}
[su_divider]

What about file_get_contents?

If you just want a simple approach to dumping page data that’s lightweight, readfile(); will work. However, if you want to change the actual data being dumped you will have to write a function to do so and as I understand readfile(); doesn’t support that.

Using file_get_contents(); does support data manipulation.

Keep in mind, we are talking about preprocessing data that has already been processed elsewhere. We are working with the content as-is. So we can’t change anything except the HTML contents.
http://php.net/manual/en/function.file-get-contents.php

Let’s say you want to dump a webpage into your own domain name and add something to the URLs in the output. In this example I am pre-pending a URL that will force anchor links to open within a frame so I can keep the visitors at my domain. Here’s the URL http://jarodthornton.com/iemajen/wordpress/link/o.php?out=. There are all kinds of things you can do so the content changes to your liking.

Here’s an example that does does three things. First it references our stylesheet to override the page output. Second we are adding our own HTML element. Lastly we will declare the variables for our output manipulation and perform a str_replace(); to find the < a href=", replace it with < a href="http://jarodthornton.com/iemajen/wordpress/link/o.php?out= pre-pending the URL to the anchor link. In this example I am using http://php.net

<link href="http://jarodthornton.com/iemajen/wordpress/readfile/php_net.css" rel="stylesheet">
echo '
<div class="header">Hello World</div>

';
$readfile = 'http://php.net/manual/en/function.readfile.php';
$pull = file_get_contents($readfile);
$domain = $_SERVER['SERVER_NAME'];
$link = 'http://jarodthornton.com/iemajen/wordpress/link/o.php?out=';
$href_quotation = '<a href="';
echo str_replace($href_quotation, $href_quotation . $link . 'http://php.net/', $pull);

You can see this in action here – http://readfile.iemajen.com/php_net.php – Click on one of the links in the purple menu or on the right side to see the frame in action.

[su_divider top=”no”] Another example using WordPress https://wordpress.comhttp://readfile.iemajen.com/wordpress_com.php – click on one of the links at the bottom of the page.

<link href="http://jarodthornton.com/iemajen/wordpress/readfile/wordpress_com.css" rel="stylesheet">;
$readfile = 'https://wordpress.com/';
$pull = file_get_contents($readfile);
$link = 'http://jarodthornton.com/iemajen/wordpress/link/o.php?out=';
$href_quotation = '<a href="';
echo str_replace($href_quotation, $href_quotation . $link . 'https://wordpress.com/', $pull);

The example I used earlier to readfile(); a WordPress webpage mapped to a domain on my server to consolidate websites would benefit from using file_get_contents(); over readfile(); so I will re-write that code so I have more control over the content.
[su_divider top=”no”]

Good practice?

Of course using this approach with a domain other than your own may not be a good practice. In this example I am using a domain that is not my own so I set a robots.txt to disallow all search engines.

User-agent: *
Disallow: /

https://en.wikipedia.org/wiki/Robots_exclusion_standard

Using it for your own projects is ideal because you have control over both the page being pulled and the page itself. As such, you would be able to more easily adapt your code to that environment. The examples I used both come with the caveat that while the conditions are met in the str_replace();, there may be other things to consider like relative urls which would break with my code.

[su_divider top=”no”]

Speaking of Search Engines

So if we are using this on our own domain names, how does readfile(); and file_get_contents(); affect search presence / search indexing? There are several things to consider here. If we simply use the function as I’ve outlined, we don’t actually have control of the outbound pages links i.e. if a link is clicked the visitor will be taken to the actual website and leave ours. In the examples I used, I worked around this by pre-pending a URL that scripts a frame when clicked.

The bottom line is that Google et al cannot see the pre-processing take place so all it knows is the output and as so would be indexed as a normal website.
[su_divider top=”no”]

Contact us to learn more about Adopt the Web for your business

Author Jarod Thornton

More posts by Jarod Thornton