next up previous contents
Next: About this document ... Up: Installing and Running mod_log_sql Previous: 3 Configuration   Contents

Subsections

4 FAQ

4.1 General module questions


4.1.1 Why log to an SQL database?

To begin with, let's get it out of the way: logging to a database is not a panacea. But while there are complexities with this solution, the benefit can be substantial for certain classes of administrator or people with advanced requirements:

For example, do you want to see all your 404's? Do this:

select remote_host,status,request_uri,bytes_sent,from_unixtime(time_stamp)

from acc_log_tbl where status=404 order by time_stamp;

remote_host status request_uri bytes_sent from_unixtime(time_stamp)
marge.mmm.co.uk 404 /favicon.ico 321 2001-11-20 02:30:56
62.180.239.251 404 /favicon.ico 333 2001-11-20 02:45:25
212.234.12.66 404 /favicon.ico 321 2001-11-20 03:01:00
212.210.78.254 404 /favicon.ico 333 2001-11-20 03:26:05

Or do you want to see how many bytes you've sent within a certain directory or site? Do this:

select request_uri,sum(bytes_sent) as bytes,count(request_uri) as howmany from

acc_log_tbl where request_uri like '%mod_log_sql%' group by request_uri order

by howmany desc; 

request_uri bytes howmany
/mod_log_sql/style_1.css 157396 1288
/mod_log_sql/ 2514337 801
/mod_log_sql/mod_log_sql.tar.gz 9769312 456
/mod_log_sql/faq.html 5038728 436

Or maybe you want to see who's linking to you? Do this:

select count(referer) as num,referer from acc_log_tbl where

request_uri='/mod_log_sql/' group by referer order by num desc;

num referer
271 http://freshmeat.net/projects/mod_log_sql/
96 http://modules.apache.org/search?id=339
48 http://freshmeat.net/
8 http://freshmeat.net

As you can see, there are myriad possibilities that can be constructed with the wonderful SQL SELECT statement. Logging to an SQL database can be really quite useful!

4.1.2 Why use MySQL? Are there alternatives?

MySQL is a robust, free, and very powerful production-quality database engine. It is well supported and comes with detailed documentation. Many 3rd-party software pacakges (e.g. Slashcode, the engine that powers Slashdot) run exclusively with MySQL. In other words, you will belong to a very robust and well-supported community by choosing MySQL.

That being said, there are alternatives. PostgreSQL is probably MySQL's leading "competitor" in the free database world. There is also an excellent module available for Apache to permit logging to a PostgreSQL database, called pgLOGd (http://www.digitalstratum.com/pglogd/).

4.1.3 Is this code production-ready?

By all accounts it is. It is known to work without a problem on many-thousands-of-hits-per-day webservers. Does that mean it is 100% bug free? Well, no software is. But it is well-tested and believed to be fully compatible with production environments. (The usual disclaimers apply. This software is provided without warranty of any kind.)

4.1.4 Who's using mod_log_sql?

Good question! It would be great to find out! If you are a production-level mod_log_sql user, please contact the maintainer, Chris Powell (chris@grubbybaby.com) so that you can be mentioned here.

4.1.5 Why doesn't the module also replace the Apache ErrorLog?

There are circumstances when that would be quite unwise - for example, if Apache could not reach the MySQL server for some reason and needed to log that fact. Without a text-based error log you'd never know anything was wrong, because Apache would be trying to log a database connection error to the database... you get the point.

Error logs are usually not very high-traffic and are really best left as text files on a web server machine.

4.1.6 Does mod_log_sql work with Apache 2.x?

As of this writing, no. The Apache Group significantly altered the module API with the release of Apache 2.0. All modules written for 1.3, including mod_log_sql, will not work with 2.0.

mod_log_sql will eventually be ported to Apache 2.x, but not immediately. It is going to take some time, and there are other features that have higher priority. Please sign up for the announcements list (on the main website) or monitor the website for updates to learn when the port (and other releases) are available.

<OPINION>If you're a *NIX user, stick with Apache 1.3.x for now. Major modules like mod_ssl and PHP are not even ready for 2.0 yet, and the main benefits in 2.0 are for Win32 users anyway. Apache 1.3.x is rock-stable and performs equally well on *NIX as 2.0.</OPINION>

4.1.7 Does mod_log_sql connect to MySQL via TCP/IP or a socket?

It depends! This is not determined by mod_log_sql. mod_log_sql relies on a connection command that is supplied in the MySQL API, and that command is somewhat intelligent. How it works:

You don't have any control of which methodology is used. You can fine-tune some of the configuration, however. The LOGSQLSOCKETFILE runtime configuration directive overrides the default of ``/var/lib/mysql/mysql.sock'' for socket-based connections, whereas the LOGSQLTCPPORT command allows to you override the default TCP port of 3306 for TCP/IP connections.

4.1.8 I have discovered a bug. Who can I contact?

Please contact the maintainer (chris@grubbybaby.com)! Your comments, suggestions, bugfixes, bug catches, and usage testimonials are always welcome. As free software, mod_log_sql is intended to be a community effort - any code contributions or other ideas will be fully and openly credited, of course.

4.2 Problems

4.2.1 Apache segfaults when using PHP and mod_log_sql

This occurs if you compiled PHP with MySQL database support. PHP utilizes its internal, bundled MySQL libraries by default. These conflict with the ``real'' MySQL libraries linked by mod_log_sql, causing the segmentation fault.

The solution is to configure PHP to link against the real MySQL libraries and recompile mod_php. Apache will run properly once the modules are all using the same version of the MySQL libraries.


4.2.2 Apache appears to start up fine, but nothing is getting logged in the database

If you do not see any entries in the access_log, then something is preventing the inserts from happening. This could be caused by several things:

Important: it is improper to ask for help before you have followed these steps.

First examine the MySQL log that you established in step 6 of section 3.1. Ensure that the INSERT statements are not being rejected because of a malformed table name or other typographical error. By enabling that log, you instructed MySQL to log every connection and command it receives - if you see no INSERT attempts in the log, the module isn't successfully connecting to the database. If you see nothing at all in the log - not even a record of your administrative connection attempts, then you did not enable the log correctly. If you do see INSERT attempts but they are failing, the log should tell you why.

Second, confirm that your LOGSQL* directives are all correct.

Third, examine the Apache error logs for messages from mod_log_sql; the module will offer hints as to why it cannot connect, etc.

The next thing to do is recompile the module with debugging output activated. change the "#undef DEBUG" on line 8 of mod_log_sql.c to "#define DEBUG" and recompile/reinstall. The module will now output copious notes about what it is doing, and this will help you (and the maintainer) solve the problem. In order to see the debugging messages, ensure that you make them visible using the LOGLEVEL directive in the main server config as well as in each VIRTUALHOST config:

LogLevel debug

ErrorLog /var/log/httpd/server-messages 

4.2.3 Why do I get the message ``insufficient configuration info to establish database link'' in my Apache error log?

At a minimum, LOGSQLDATABASE and LOGSQLLOGININFO must be defined in order for the module to be able to establish a database link. If these are not defined or are incomplete you will receive this error message.

4.2.4 My database cannot handle all the open connections from mod_log_sql, is there anything I can do?

The rule of thumb: if you have n webservers each configured to support y MAXCLIENTS, then your database must be able to handle $n\times y$ simultenous connections in the worst case. Certainly you must use common sense, consider reasonable traffic expectations and structure things accordingly.

Tweaking my.cnf to scale to high connection loads is imperative. But if hardware limitations prevent your MySQL server from gracefully handling the number of incoming connections, it would be beneficial to upgrade the memory or CPU on that server in order to handle the load.

Jeremy Zawodny, a highly respected MySQL user and contributor to Linux Magazine, has this very helpful and highly appropriate article on tuning MySQL: http://jeremy.zawodny.com/blog/archives/000173.html

Please remember that mod_log_sql's overriding principle is performance - that is what the target audience demands and expects. Other database logging solutions do not open and maintain many database connections, but their performance suffers drastically. For example, pgLOGd funnels all log connections through a separate daemon that connects to the database, but that bottlenecks the entire process. mod_log_sql achieves performance numbers an order of magnitude greater than the alternatives because it dispenses with the overhead associated with rapid connection cycling, and it doesn't attempt to shoehorn all the database traffic through a single extra daemon or proxy process.

4.2.5 Why do I occasionally see a ``lost connection to MySQL server'' message in my Apache error log?

This message may appear every now and then in your Apache error log, especially on very lightly loaded servers. This doesn't mean that anything is necessarily wrong. Within each httpd child process, mod_log_sql will open (and keep open) a connection to the MySQL server. MySQL, however, will close connections that haven't been used in a while; the default timeout is 8 hours. When this occurs, mod_log_sql will notice and re-open the connection. That event is what is being logged, and looks like this:

[Tue Nov 12 19:04:10 2002] [error] mod_log_sql: first attempt failed, 

  API said: error 2013, Lost connection to MySQL server during query 

[Tue Nov 12 19:04:10 2002] [error] mod_log_sql: reconnect successful

[Tue Nov 12 19:04:10 2002] [error] mod_log_sql: second attempt successful

Reference: MySQL documentation (http://www.mysql.com/documentation/mysql/bychapter/manual_Problems.html#Gone_away)

4.3 Performance and Tuning

4.3.1 How well does it perform?

mod_log_sql scales to very high loads. Apache 1.3.22 + mod_log_sql was benchmarked using the "ab" (Apache Bench) program that comes with the Apache distribution; here are the results.

Overall configuration:

Apache configuration:

Timeout 300 

KeepAlive On 

MaxKeepAliveRequests 100 

KeepAliveTimeout 15 

MinSpareServers 5 

StartServers 10 

MaxSpareServers 15 

MaxClients 256 

MaxRequestsPerChild 5000 

LogSQLTransferLogFormat AbHhmRSsTUuvc 

LogSQLWhichCookie Clicks 

CookieTracking on 

CookieName Clicks

"ab" commandline:

./ab -c 10 -t 20 -v 2 -C Clicks=ab_run http://www.hostname.com/target 
( 10 concurrent requests; 20 second test; setting a cookie "Clicks=ab_run"; target = the mod_log_sql homepage. )

Ten total ab runs were conducted: five with MySQL logging enabled, and five with all MySQL directives commented out of httpd.conf. Then each five were averaged. The results:

In other words, any rate-limiting effects on this particular hardware setup are not caused by MySQL. Note that although this very simple webserver setup is hardly cutting-edge - it is, after all, a fairly small machine - 139 requests per second equal over twelve million hits per day.

If you run this benchmark yourself, take note of three things:

  1. Use a target URL that is on your own webserver :-).
  2. Wait until all your connections are closed out between runs; after several thousand requests your TCP/IP stack will be filled with hundreds of connections in TIME_WAIT that need to close. Do a "netstat -t|wc -l" on the webserver to see. If you don't wait, you can expect to see a lot of messages like "ip_conntrack: table full, dropping packet" in your logs. (This has nothing to do with mod_log_sql, this is simply the nature of the TCP/IP stack in the Linux kernel.)
  3. When done with your runs, clean these many thousands of requests out of your database:
mysql> delete from access_log where agent like 'ApacheBench%'; 

mysql> optimize table access_log; 

4.3.2 Do I need to be worried about all the running MySQL children? Will holding open n Apache-to-MySQL connections consume a lot of memory?

Short answer: you shouldn't be worried.

Long answer: you might be evaluating at the output of ``ps -aufxw'' and becoming alarmed at all the 7MB httpd processes or 22MB mysqld children that you see. Don't be alarmed. It's true that mod_log_sql opens and holds open many MySQL connections: each httpd child maintains one open database connection (and holds it open for performance reasons). Four webservers, each running 20 Apache children, will hold open 80 MySQL connections, which means that your MySQL server needs to handle 80 simultaneous connections. In truth, your MySQL server needs to handle far more than that if traffic to your website spikes and the Apache webservers spawn off an additional 30 children each...

Fortunately the cost reported by 'ps -aufxw' is deceptive. This is due to an OS memory-management feature called ``copy-on-write.'' When you have a number of identical child processes (e.g. Apache, MySQL), it would appear in ``ps'' as though each one occupies a great deal of RAM - as much as 7MB per httpd child! In actuality each additional child only occupies a small bit of extra memory - most of the memory pages are common to each child and therefore shared in a ``read-only'' fashion. The OS can get away with this because the majority of memory pages for one child are identical across all children. Instead of thinking of each child as a rubber stamp of the others, think of each child as a basket of links to a common memory area.

A memory page is only duplicated when it needs to be written to, hence ``copy-on-write.'' The result is efficiency and decreased memory consumption. ``ps'' may report 7MB per child, but it might really only ``cost'' 900K of extra memory to add one more child. It is not correct to assume that 20 Apache children with a VSZ of 7MB each equals $(20\times 7MB)$ of memory consumption - the real answer is much, much lower. The same ``copy-on-write'' rules apply to all your MySQL children: 40 mysqld children @ 22MB each do not occupy 880MB of RAM.

The bottom line: although there is a cost to spawn extra httpd or mysqld children, that cost is not as great as ``ps'' would lead you to believe.

4.3.3 My webserver cannot handle all the traffic that my site receives, is there anything I can do?

If you have exhausted all the tuning possibilities on your existing server, it is probably time you evaluated the benefits of clustering two or more webservers together in a load-balanced fashion. In fact, users of such a setup are mod_log_sql's target audience!


4.3.4 What is the issue with activating delayed inserts?

There are several.

  1. INSERT DELAYED is a specific syntax to MySQL and is not supported by any other database. Ergo, why is it needed, and what MySQL deficiency is it working around? INSERT DELAYED is a kluge.
  2. The MySQL documentation is unclear whether INSERT DELAYED is even necessary for an optimized database. It says, ``The DELAYED option for the INSERT statement is a MySQL-specific option that is very useful if you have clients that can't wait for the INSERT to complete.'' But then it goes on to say, ``Note that as MyISAM tables supports concurrent SELECT and INSERT, if there is no free blocks in the middle of the data file, you very seldom need to use INSERT DELAYED with MyISAM.''
  3. Because INSERT DELAYED returns without waiting for the data to be written, a hard kill of your MySQL database at the right (wrong?) moment could lose those logfile entries.
  4. As of MySQL version 3.23.52, the error return functions disagree after a failed INSERT DELAYED: mysql_errno() always returns 0, even if mysql_error() returns a textual error. I have reported this bug to the MySQL folks. However, we have no way of knowing what solution they will adopt to fix this, and with the worst case solution mod_log_sql would not be able to tell if anything went wrong with a delayed insert.
Instead of delayed inserts, you may wish to utilize InnoDB tables (instead of the standard MyISAM tables). InnoDB tables suppot row-level locking and are recommended for high-volume databases.

If after understanding these problems you still wish to enable delayed inserts, section 3.5.4 discusses how.

4.4 ``How do I...?'' - accomplishing certain tasks

4.4.1 I am using LogSQLMassVirtualHosting, and sometimes a single VirtualHost gets logged to two different tables. How do I prevent that?

Proper usage of the Apache runtime SERVERNAME directive and the directive USECANONICALNAME ON (or DNS) are necessary to prevent this problem. ``On'' is the default for USECANONICALNAME, and specifies that self-referential URLs are generated from the SERVERNAME part of your VirtualHost:

With UseCanonicalName on (and in all versions prior to 1.3) Apache will use the ServerName and Port directives to construct the canonical name for the server. With UseCanonicalName off Apache will form self-referential URLs using the hostname and port supplied by the client if any are supplied (otherwise it will use the canonical name, as defined above). [From the Apache documentation http://httpd.apache.org/docs/mod/core.html#usecanonicalname]
The module inherits Apache's ``knowledge'' about the server name being accessed. As long as those two directives are properly configured, mod_log_sql will log to only one table per virtual host while using LOGSQLMASSVIRTUALHOSTING.

4.4.2 How do I extract the data in a format that my analysis tool can understand?

mod_log_sql would be virtually useless if there weren't a way for you to extract the data from your database in a somewhat meaningful fashion. To that end there's a Perl script enclosed with the distribution. That script (make_combined_log.pl) is designed to extract N-many days worth of access logs and provide them in a Combined Log Format output. You can use this very tool right in /etc/crontab to extract logs on a regular basis so that your favorite web analysis tool can read them. Or you can examine the Perl code to construct your own custom tool.

For example, let's say that you want your web statistics updated once per day in the wee hours of the morning. A good way to accomplish that could be the following entries in /etc/crontab:

# Generate the temporary apache logs from the MySQL database (for webalizer) 

05 04 * * * root make_combined_log.pl 1 www.grubbybaby.com > /var/log/temp01

# Run webalizer on httpd log 

30 04 * * * root webalizer -c /etc/webalizer.conf; rm -f /var/log/temp01

Or if you have a newer system that puts files in /etc/cron.daily etc., create a file called ``webalizer'' in the cron.daily subdirectory. Use the following as the contents of your file, and make sure to chmod 755 it when done.

#!/bin/sh

/usr/local/sbin/make_combined_log.pl 1 www.yourdomain.com > /var/log/httpd/templog

/usr/local/bin/webalizer -q -c /etc/webalizer.conf 

rm -f /var/log/httpd/templog

See? Easy.


4.4.3 How can I log mod_usertrack cookies?

A number of people like to log mod_usertrack cookies in their Apache TransferLog to aid in understanding their visitors' clickstreams. This is accomplished, for example, with a statement as follows:

LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" \"%{cookie}n\""
Naturally it would be nice for mod_log_sql to permit the admin to log the cookie data as well, so as of version 1.10 you can do this. You need to have already compiled mod_usertrack into httpd - it's one of the standard Apache modules.

First make sure you have a column called "cookie" in the MySQL database to hold the cookies, which can be done as follows if you already have a working database:

alter table acc_log_tbl add column cookie varchar(255);
Next configure your server to set usertracking cookies as follows, and make sure you include the new 'c' directive in your LOGSQLTRANSFERLOGFORMAT, which activates cookie logging. Here's an example:

<VirtualHost 1.2.3.4> 

 CookieTracking on 

 CookieStyle Cookie 

 CookieName Foobar 

 LogSQLTransferLogFormat huSUsbTvRAc 

 LogSQLWhichCookie Foobar 

</VirtualHost>

The first three lines configure mod_usertrack to create a COOKIE (RFC 2109) format cookie called Foobar. The last two lines tell mod_log_sql to log cookies named Foobar. You have to choose which cookie to log because more than one cookie can/will be sent to the server by the client.

Recap: the 'c' character activates cookie logging, and the LOGSQLWHICHCOOKIE directive chooses which cookie to log.

FYI, you are advised NOT to use COOKIESTYLE COOKIE2 - it seems that even newer browsers (IE 5.5, etc.) have trouble with the new COOKIE2 (RFC 2965) format. Just stick with the standard COOKIE format and you'll be fine.

Perform some hits on your server and run a select:

mysql> select request_uri,cookie from access_log where cookie is not null;

request_uri cookie
/mod_log_sql/ ool-18e4.dyn.optonline.net.130051007102700823
/mod_log_sql/usa.gif ool-18e4.dyn.optonline.net.130051007102700823
/mod_log_sql/style_1.css ool-18e4.dyn.optonline.net.130051007102700823

4.4.4 What if I want to log more than one cookie? What is the difference between LogSQLWhichCookie and LogSQLWhichCookies?

As of version 1.17, you have a choice in how you want cookie logging handled.

If you are interested in logging only one cookie per request, follow the instructions in section 4.4.3 above. That cookie will be logged to a column in the regular access_log table, and the actual cookie you want to log is specified with LOGSQLWHICHCOOKIE. Don't forget to specify the 'c' character in LOGSQLTRANSFERLOGFORMAT.

If, however, you need to log multiple cookies per request, you must employ the LOGSQLWHICHCOOKIES (note the plural) directive. The cookies you specify will be logged to a separate table (as discussed in section 3.5.2), and entries in that table will be linked to the regular access_log entries via the unique ID that is supplied by mod_unique_id. Without mod_unique_id the information will still be logged but you will be unable to correlate which cookies go with which access-requests. Furthermore, with LOGSQLWHICHCOOKIES, you do not need to include the 'c' character in LOGSQLTRANSFERLOGFORMAT.

LOGSQLWHICHCOOKIE and LOGSQLWHICHCOOKIES can coexist without conflict because they operate on entireley different tables, but you're better off choosing the one you need.

4.4.5 What are the SSL logging features, and how do I activate them?

Note: you do not need to compile SSL support into mod_log_sql in order to simply use it with a secure site. You only need to compile SSL support into mod_log_sql if you want to log SSL-specific data such as the cipher type used, or the keysize that was negotiated. If that information is unimportant to you, you can ignore this FAQ.

By adding certain characters to your LOGSQLTRANSFERLOGFORMAT string you can tell mod_log_sql to log the SSL cipher, the SSL keysize of the connection, and the maximum keysize that was available. This would let you tell, for example, which clients were using only export-grade security to access your secure software area.

You can compile mod_log_sql with SSL logging support if you have the right packages installed. If you already have an SSL-enabled Apache then you by definition have the correct packages already installed: OpenSSL and mod_ssl.

You need to ensure that your database is set up to log the SSL data. Issue the following commands to MySQL if your access table does not already have them:

alter table access_log add column ssl_cipher varchar(25);

alter table access_log add column ssl_keysize smallint unsigned;

alter table access_log add column ssl_maxkeysize smallint unsigned;

Finally configure httpd.conf to activate the SSL fields. Note that this is only meaningful in a VirtualHost that is set up for SSL.

<VirtualHost 1.2.3.4:443> 

 LogSQLTransferLogFormat AbHhmRSsTUuvcQqz 

</VirtualHost>

The last three characters (Qqz) in the directive are the SSL ones; see section 3.6.17 in the directives documentation for details of the LOGSQLTRANSFERLOGFORMAT directive.

Restart Apache, then perform some hits on your server. Then run the following select statement:

mysql> select remote_host,request_uri,ssl_cipher,ssl_keysize,ssl_maxkeysize

from access_log where ssl_cipher is not null;

remote_host request_uri ssl_cipher ssl_keysize ssl_maxkeysize
216.190.52.4 /dir/somefile.html RC4-MD5 128 128
216.190.52.4 /dir/somefile.gif RC4-MD5 128 128
216.190.52.4 /dir/somefile.jpg RC4-MD5 128 128


next up previous contents
Next: About this document ... Up: Installing and Running mod_log_sql Previous: 3 Configuration   Contents
Chris Powell 2002-12-18