PHP fopen() and concurrent reads/writes
MatthewHSE
Status: Contributor
Joined: 20 Jul 2004
Posts: 122
Location: Central Illinois, typically glued to a computer screen
Reply Quote
I'm working on a script where users will get to add content to a .txt file on my server. (Yes, I'm being very careful to validate the data first! ;) ) The data I'm accepting is very limited and it will be easy to check if someone else has already added the same content to the file - if they have, duplicate submissions will be refused.

I'm using fopen() with the 'a+' flag to open the file for reading and writing. I'll be checking to see if the data has already been added to the file, and if not, writing it. So far, so good.

The problem is that several people may use this feature at the same time to write data to the same file. I can imagine a scenario in which one or the other runs into a file locking proglem, or possibly one person's data doesn't get written to the file. Actually I'm not sure what will happen if two people try to write to the file at the same time, and I don't know how to test it.

The hurdles to overcome are:

1.) Prevent one user from locking the file, OR (since the file will only be open for a fraction of a second) keep other requests "in queue" until the first one is finished.

2.) If the file can be opened by multipe users at once, all in read-write mode, allow all additions to be added.

I hope I'm making this clear. Basically, all I need to know is how to allow one or several users to read and write to the file simultaneously, or how to queue the requests.
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4129
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
As far as I know you can't queue to file writing. Some of the issues were dealt with in then modified spider blocking script, which is a slightly more complex version of Birdman's. Both links require webmasterworld membership to view.

The real question though is why not just do it with a database? Then you can just output the database into a text file, that's pretty easy to do, looks the same to end users too. And you avoid all the potential simultaneous adds and processing of a single file. There's another script from that spider blocking, but I can't find it, it has a simpler logic of how to check if the script is in use.

But remember, when you run a php page, it does't wait, it just runs, so if the file is already open, it will get unpredictable, if I remember that correctly. I only use writing to files when the writes are very few and far between, like on a relatively low traffic spider trap for example. This circumstance is almost the exact definition of why and when you should be using a database.

As far as I know, php doesn't have anything like a timer, it will either find the file open and available or locked, if it's locked the write will fail. Feel free to double check this, but I think it's right, just picture a totally linear execution in your mind, you can't put that on hold as far as I know. Remember, one page access doesn't know what another page access is doing, it's unique, isolated. For php, each and every page view is a totally new experience for it, unless there is some type of session data available, cookies, and even then it's just related to one user.

So when user a and user b arrive at the same time, fill out the form at the same time, submit at the same time, my guess is the first one to reach the file and lock it wins.
Back to top
MatthewHSE
Status: Contributor
Joined: 20 Jul 2004
Posts: 122
Location: Central Illinois, typically glued to a computer screen
Reply Quote
That's what I figured would be the case. I think I've found a way around this using temporary files. The reason I'm using flat files for this is because the amount of data is very small and I'd hoped to avoid the overhead of a database.

Actually, this is a newsletter signup form, the same one I was asking about Ajax to help make it work right. I'm ironing out the serverside part of the process and am running into some pretty amusing, though frustrating, problems. I'll start a new thread with full details; I know you like a challenge so this should be right up your alley! ;)
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4129
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
I would recommend using a database, for several reasons:

1. It solves the simultaneous issue easily.
2. More important, it gives you room to change and expand in the future. Since you only need essentially two fields, the unique id key, which is automatically created, and the actual data, there is really no overhead, the script will simply get the data whenever somebody enters it, it will check it against previous entries, then it will update the db. That isn't very much overhead.

I wrote a mailing list database app for a friend of mine, it's very small too, but it kicks butt, it solved all our problems, emails updated csv files for mass mailing software import sent to us by emails, I never ever have to deal with the mailing list again, in fact, I barely look at the site anymore, it's almost all automated.

Overhead is worth worrying about when you start approaching 10 or 15 simultaneous database connections, or users, on that page, per second. That's a decent amount of traffic, assuming it's on a shared mysql database server. If it's your server, you don't have to worry about it at all until you start approaching I'd guess 100 k visitors a day. Or more.

If you need a simple script to spit out a text file let me know, I have a few done, you'd have to change the variables etc of course, but it's really easy. Or you can have the file be created live whenever somebody requests it, that works too, avoids some issues like writing to the same file while it's open. All programming books say avoid file based stuff as much as possible, when you are dealing with live stuff, I use it but only in two circumstances, one is when there is only one user, from an admin panel, doing the updates, but even there I also write it to a database in case I want to use it for something else later.

Just remember when you create the database, try to look into the future and anticipate future requirements or additions, it helps a lot, it's easier to build something with 4 fields now, say, than to add 2 fields later.
Back to top
MatthewHSE
Status: Contributor
Joined: 20 Jul 2004
Posts: 122
Location: Central Illinois, typically glued to a computer screen
Reply Quote
Thanks, I'll probably wind up with a database eventually, but in the meantime I'm having fun trying to figure out why some of the errors I've been experiencing have been able to occur.

The main reason I figured to avoid a database was because you supposedly can't beat a flat file for speed. Since this is being processed in real time with Ajax, I wanted as much speed as possible. As you said, though, there's not much overhead here, and usage will be low, so a database is probably the simplest method after all.

More here...
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4129
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
:: Quote ::
you supposedly can't beat a flat file for speed


Hard to believe that would be true, I've always been very fond of static flat files, it's easy to work with them, most bigger stuff I've done uses those primarily for data, but I always write the scripts to load everything once, then keep it in ram, until the script is finished executing, in other words, if I have an array data file, the script opens it, puts it into a variable, sets a flag, then delivers the variable to rest of the data calls, the flag once set blocks further accesses to the data file. And the flat data files are always static.

For super heavy applications, flat files can work, but it's hard to deal with all the issues, and it's not really worth it. What happens with db sites, the problem there is that there is so much data being accessed, and sql is being used so much, that the processor bogs down, the amount of data fills ram, and has to be written to disk in swap, everything slows down. But you're not anywhere remotely close to a situation like that, you'll now and then be making a single set of select and update queries, that's it.

But I'll happily admit I dislike database programming, it's hard, debugging is hard, and being able to just upload a flat file to any server anywhere is nice. Plus updating flat file data is really easy, you just use a text editor, no need to write an update form etc.

BUT..... I don't think the flat file access speed is correct, I might be wrong, but here's how I perceive it logically:

A flat file resides on the hard disk. Hard disks are by a huge factor the slowest way to access data for a computer. Ram moves data at something like 4 gigabytes a second. For the geeks among us, here's a cool data speed section, I found it at hardwaresecrets.com. I tried copying the memory data rate table but it wouldn't work, that's a useful collection of system speed factoids though, many pages, each about one part of the computer.

:: Quote ::
Actually, if you have an AMD CPU you won't have problems. For instance, if you have an Athlon XP with 400 MHz (3,200 MB/s transfer rate) external bus just install DDR400 memories on your computer and it will work great since both the CPU and the memory will be running at the same speed grade (3,200 MB/s). You can use DDR Dual Channel to improve the performance, anyway, as we will be talking about on the next page.


So say you have a memory transfer speed of 3.2 gB per second. Most databases tend to keep the data base in ram. Average hard drive transfer speeds, depending on the server and configuration, are about 40 mB per second sustained, more for SCSI, but most lower end servers don't use SCSI.

As you can see, even with the overhead of running mysql say, once the connection is made and the query is responded to, the data is available at a rate of about 75 times faster than through disk access.

And that's slower single channel DDR 1. Dual channel ddr 1 and 2 get upto about 10-12 gB per second transfer rates, a whopping 200 times or so faster than reading directly from the hard drive.
Back to top
Display posts from previous:   

All times are GMT - 8 Hours