mod_log_sql 2.0 Documentation

Home » Projects » Apache » Mod_log_sql » Docs-2.0 » mod_log_sql 2.0 Documentation

Prev: Configuration --- Up: TOC

FAQ

1. General module questions
1. Why log to an SQL database?
2. Why use MySQL? Are there alternatives?
3. Is this code production-ready?
4. Who's using mod_log_sql?
5. Why doesn't the module also replace the Apache ErrorLog?
6. Does mod_log_sql work with Apache 2.x?
7. Does mod_log_sql connect to MySQL via TCP/IP or a socket?
8. I have discovered a bug. Who can I contact?
2. Problems
1. Apache segfaults or has other problems when using PHP and mod_log_sql
2. Apache appears to start up fine, but nothing is getting logged in the database
3. Why do I get the message "insufficient configuration info to establish database link" in my Apache error log?
4. My database cannot handle all the open connections from mod_log_sql, is there anything I can do?
5. Why do I occasionally see a "lost connection to MySQL server" message in my Apache error log?
6. Sometimes a single VirtualHost gets logged to two different tables (e.g. access_foo_com, access_www_foo_com). Or, accesses to an unqualified hostname (e.g. "http://intranet/index.html") get logged in separate tables.
3. Performance and Tuning
1. How well does it perform?
2. Do I need to be worried about all the running MySQL children? Will holding open n Apache-to-MySQL connections consume a lot of memory?
3. My webserver cannot handle all the traffic that my site receives, is there anything I can do?
4. What is the issue with activating delayed inserts?
4. "How do I...?" -- accomplishing certain tasks
1. How do I extract the data in a format that my analysis tool can understand?
2. How can I log mod_usertrack cookies?
3. What if I want to log more than one cookie? What is the difference between LogSQLWhichCookie and LogSQLWhichCookies?
4. What are the SSL logging features, and how do I activate them?

General module questions

1.1.Why log to an SQL database?

To begin with, let's get it out of the way: logging to a database is not a panacea. But while there are complexities with this solution, the benefit can be substantial for certain classes of administrator or people with advanced requirements:

  • Chores like log rotation go away, as you can DELETE records from the SQL database once they are no longer useful. For example, the excellent and popular log-analysis tool Webalizer (http://www.webalizer.com) does not need historic logs after it has processed them, enabling you to delete older logs.

  • People with clusters of web servers (for high availability) will benefit the most - all their webservers can log to a single SQL database. This obviates the need to collate/interleave the many separate logfiles, which can be / highly/ problematic.

  • People acquainted with the power of SQL SELECT statements will know the flexibility of the extraction possibilities at their fingertips.

For example, do you want to see all your 404's? Do this:

SELECT remote_host, status, request_uri, bytes_sent, from_unixtime(time_stamp)
FROM acc_log_tbl WHERE status=404 ORDER BY time_stamp;
Table 9:
remote_host status request_uri bytes_sent from_unixtime(time_stamp)
marge.mmm.co.uk 404 /favicon.ico 321 2001-11-20 02:30:56
62.180.239.251 404 /favicon.ico 333 2001-11-20 02:45:25
212.234.12.66 404 /favicon.ico 321 2001-11-20 03:01:00
212.210.78.254 404 /favicon.ico 333 2001-11-20 03:26:05

Or do you want to see how many bytes you've sent within a certain directory or site? Do this:

SELECT request_uri,sum(bytes_sent) AS bytes, count(request_uri) AS howmany
FROM acc_log_tbl
WHERE request_uri LIKE '%mod_log_sql%'
GROUP BY request_uri ORDER BY howmany DESC;
Table 10:
request_uri bytes howmany
/mod_log_sql/style_1.css 157396 1288
/mod_log_sql/ 2514337 801
/mod_log_sql/mod_log_sql.tar.gz 9769312 456
/mod_log_sql/faq.html 5038728 436

Or maybe you want to see who's linking to you? Do this:

SELECT count(referer) AS num,referer
FROM acc_log_tbl
WHERE request_uri='/mod_log_sql/'
GROUP BY referer ORDER BY num DESC;
Table 11:
num referer
271 http://freshmeat.net/projects/mod_log_sql/
96 http://modules.apache.org/search?id=339
48 http://freshmeat.net/
8 http://freshmeat.net

As you can see, there are myriad possibilities that can be constructed with the wonderful SQL SELECT statement. Logging to an SQL database can be really quite useful!

1.2.Why use MySQL? Are there alternatives?

MySQL is a robust, free, and very powerful production-quality database engine. It is well supported and comes with detailed documentation. Many 3rd-party software pacakges (e.g. Slashcode, the engine that powers Slashdot) run exclusively with MySQL. In other words, you will belong to a very robust and well-supported community by choosing MySQL.

That being said, there are alternatives. PostgreSQL is probably MySQL's leading "competitor" in the free database world. There is also an excellent module available for Apache to permit logging to a PostgreSQL database, called pgLOGd

[Note]

Currently a database abstraction system is in the works to allow any database to be used with mod_log_sql.

1.3.Is this code production-ready?

By all accounts it is. It is known to work without a problem on many-thousands-of-hits-per-day webservers. Does that mean it is 100% bug free? Well, no software is, but it is well-tested and believed to be fully compatible with production environments. (The usual disclaimers apply. This software is provided without warranty of any kind.)

1.4.Who's using mod_log_sql?

Good question! It would be great to find out! If you are a production-level mod_log_sql user, please contact eddie at so that you can be mentioned here.

1.5. Why doesn't the module also replace the Apache ErrorLog?

There are circumstances when that would be quite unwise -- for example, if Apache could not reach the MySQL server for some reason and needed to log that fact. Without a text-based error log you'd never know anything was wrong, because Apache would be trying to log a database connection error to the database... you get the point.

Error logs are usually not very high-traffic and are really best left as text files on a web server machine.

The Error log is free format text.. (no specified formatting what, so ever) which is rather difficult to nicely format for storing in a database.

1.6.Does mod_log_sql work with Apache 2.x?

Yes. A port of mod_log_sql is available for Apache 2.x as of mod_log_sql 1.90

1.7. Does mod_log_sql connect to MySQL via TCP/IP or a socket?

Quick answer, Yes.

It depends! This is not determined by mod_log_sql. mod_log_sql relies on a connection command that is supplied in the MySQL API, and that command is somewhat intelligent. How it works:

  • if the specified MySQL database is on the same machine, the connection command uses a socket to communicate with MySQL

  • if the specified MySQL database is on a different machine, mod_log_sql connects using TCP/IP.

You don't have any control of which methodology is used. You can fine-tune some of the configuration, however. The LogSQLSocketFile runtime configuration directive overrides the default of "/var/lib/mysql/mysql.sock" for socket-based connections, whereas the LogSQLTCPPort command allows to you override the default TCP port of 3306 for TCP/IP connections.

1.8.I have discovered a bug. Who can I contact?

Please contact Edward Rudd at , or post a message to the mod_log_sql Mailing Lists . Your comments, suggestions, bugfixes, bug catches, and usage testimonials are always welcome. As free software, mod_log_sql is intended to be a community effort -- any code contributions or other ideas will be fully and openly credited, of course.

Problems

2.1. Apache segfaults or has other problems when using PHP and mod_log_sql

This occurs if you compiled PHP with MySQL database support. PHP utilizes its internal, bundled MySQL libraries by default. These conflict with the "real" MySQL libraries linked by mod_log_sql, causing the segmentation fault.

PHP and mod_log_sql can be configured to happily coexist. The solution is to configure PHP to link against the real MySQL libraries: recompile PHP using --with-mysql=/your/path. Apache will run properly once the modules are all using the same version of the MySQL libraries.

2.2. Apache appears to start up fine, but nothing is getting logged in the database

If you do not see any entries in the access_log, then something is preventing the inserts from happening. This could be caused by several things:

  • Improper privileges set up in the MySQL database

  • You are not hitting a VirtualHost that has a LogSQLTransferLogTable entry

  • You did not specify the right database host or login information

  • Another factor is preventing a connection to the database

[Note]

It is improper to ask for help before you have followed these steps.

First examine the MySQL log that you established in step of section Preparing MySQL for logging . Ensure that the INSERT statements are not being rejected because of a malformed table name or other typographical error. By enabling that log, you instructed MySQL to log every connection and command it receives -- if you see no INSERT attempts in the log, the module isn't successfully connecting to the database. If you see nothing at all in the log -- not even a record of your administrative connection attempts, then you did not enable the log correctly. If you do see INSERT attempts but they are failing, the log should tell you why.

Second, confirm that your LogSQL* directives are all correct.

Third, examine the Apache error logs for messages from mod_log_sql; the module will offer hints as to why it cannot connect, etc.

The next thing to do is to change the LogLevel directive in the main server config as well as in each VirtualHost config:

LogLevel debug
ErrorLog /var/log/httpd/server-messages
2.3. Why do I get the message "insufficient configuration info to establish database link" in my Apache error log?

At a minimum, LogSQLLoginInfo in the URl form and either LogSQLTableName or LogSQLMassVirtualHosting must be defined in order for the module to be able to establish a database link. If these are not defined or are incomplete you will receive this error message.

2.4. My database cannot handle all the open connections from mod_log_sql, is there anything I can do?

The rule of thumb: if you have n webservers each configured to support y MaxClients, then your database must be able to handle n times y simultaneous connections in the worst case. Certainly you must use common sense, consider reasonable traffic expectations and structure things accordingly.

Tweaking my.cnf to scale to high connection loads is imperative. But if hardware limitations prevent your MySQL server from gracefully handling the number of incoming connections, it would be beneficial to upgrade the memory or CPU on that server in order to handle the load.

Jeremy Zawodny, a highly respected MySQL user and contributor to Linux Magazine, has this very helpful and highly appropriate article on tuning MySQL: MySQL, Linux, and Thread Caching

Please remember that mod_log_sql's overriding principle is performance -- that is what the target audience demands and expects. Other database logging solutions do not open and maintain many database connections, but their performance suffers drastically. For example, pgLOGd funnels all log connections through a separate daemon that connects to the database, but that bottlenecks the entire process. mod_log_sql achieves performance numbers an order of magnitude greater than the alternatives because it dispenses with the overhead associated with rapid connection cycling, and it does not attempt to shoehorn all the database traffic through a single extra daemon or proxy process.

[Note]

Currently connection pooling is being implemented as part of the Database Abstraction layer to allow multiple httpd processes to share connections.

2.5. Why do I occasionally see a "lost connection to MySQL server" message in my Apache error log?

This message may appear every now and then in your Apache error log, especially on very lightly loaded servers. This does not mean that anything is necessarily wrong. Within each httpd child process, mod_log_sql will open (and keep open) a connection to the MySQL server. MySQL, however, will close connections that have not been used in a while; the default timeout is 8 hours. When this occurs, mod_log_sql will notice and re-open the connection. That event is what is being logged, and looks like this:

[Tue Nov 12 19:04:10 2002] [error] mod_log_sql: first attempt failed,
  API said: error 2013, Lost connection to MySQL server during query
[Tue Nov 12 19:04:10 2002] [error] mod_log_sql: reconnect successful
[Tue Nov 12 19:04:10 2002] [error] mod_log_sql: second attempt successful

Reference: MySQL documentation

2.6. Sometimes a single VirtualHost gets logged to two different tables (e.g. access_foo_com, access_www_foo_com). Or, accesses to an unqualified hostname (e.g. "http://intranet/index.html") get logged in separate tables.

Proper usage of the Apache runtime ServerName directive and the directive UseCanonicalName On (or DNS) are necessary to prevent this problem. "On" is the default for UseCanonicalName, and specifies that self-referential URLs are generated from the ServerName part of your VirtualHost:

With UseCanonicalName on (and in all versions prior to 1.3) Apache will use the ServerName and Port directives to construct the canonical name for the server. With UseCanonicalName off Apache will form self-referential URLs using the hostname and port supplied by the client if any are supplied (otherwise it will use the canonical name, as defined above). [From the Apache documentation ]

The module inherits Apache's "knowledge" about the server name being accessed. As long as those two directives are properly configured, mod_log_sql will log to only one table per virtual host while using LogSQLMassVirtualHosting.

Performance and Tuning

3.1.How well does it perform?

mod_log_sql scales to very high loads. Apache 1.3.22 + mod_log_sql was benchmarked using the "ab" (Apache Bench) program that comes with the Apache distribution; here are the results.

Overall configuration

  • Machine A: Apache webserver

  • Machine B: MySQL server

  • Machines A and B connected with 100Mbps Ethernet

  • Webserver: Celeron 400, 128MB RAM, IDE storage

Apache configuration

Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
MinSpareServers 5
StartServers 10
MaxSpareServers 15
MaxClients 256
MaxRequestsPerChild 5000
LogSQLTransferLogFormat AbHhmRSsTUuvc
LogSQLWhichCookie Clicks
CookieTracking on
CookieName Clicks

"ab" commandline

./ab -c 10 -t 20 -v 2 -C Clicks=ab_run http://www.hostname.com/target

( 10 concurrent requests; 20 second test; setting a cookie "Clicks=ab_run"; target = the mod_log_sql homepage. )

Ten total ab runs were conducted: five with MySQL logging enabled, and five with all MySQL directives commented out of httpd.conf. Then each five were averaged. The results:

  • Average of five runs employing MySQL and standard text logging: 139.01 requests per second, zero errors.

  • Average of five runs employing only standard text logging: 139.96 requests per second, zero errors.

In other words, any rate-limiting effects on this particular hardware setup are not caused by MySQL. Note that although this very simple webserver setup is hardly cutting-edge -- it is, after all, a fairly small machine -- 139 requests per second equal over twelve million hits per day.

If you run this benchmark yourself, take note of three things:

  1. Use a target URL that is on your own webserver :-).

  2. Wait until all your connections are closed out between runs; after several thousand requests your TCP/IP stack will be filled with hundreds of connections in TIME_WAIT that need to close. Do a "netstat -t|wc -l" on the webserver to see. If you don't wait, you can expect to see a lot of messages like "ip_conntrack: table full, dropping packet" in your logs. (This has nothing to do with mod_log_sql, this is simply the nature of the TCP/IP stack in the Linux kernel.)

  3. When done with your runs, clean these many thousands of requests out of your database:

    mysql> delete from access_log where agent like 'ApacheBench%';
    mysql> optimize table access_log;
3.2. Do I need to be worried about all the running MySQL children? Will holding open n Apache-to-MySQL connections consume a lot of memory?

Short answer: you shouldn't be worried.

Long answer: you might be evaluating at the output of "ps -aufxw" and becoming alarmed at all the 7MB httpd processes or 22MB mysqld children that you see. Don't be alarmed. It's true that mod_log_sql opens and holds open many MySQL connections: each httpd child maintains one open database connection (and holds it open for performance reasons). Four webservers, each running 20 Apache children, will hold open 80 MySQL connections, which means that your MySQL server needs to handle 80 simultaneous connections. In truth, your MySQL server needs to handle far more than that if traffic to your website spikes and the Apache webservers spawn off an additional 30 children each...

Fortunately the cost reported by 'ps -aufxw' is deceptive. This is due to an OS memory-management feature called "copy-on-write." When you have a number of identical child processes (e.g. Apache, MySQL), it would appear in "ps" as though each one occupies a great deal of RAM -- as much as 7MB per httpd child! In actuality each additional child only occupies a small bit of extra memory -- most of the memory pages are common to each child and therefore shared in a "read-only" fashion. The OS can get away with this because the majority of memory pages for one child are identical across all children. Instead of thinking of each child as a rubber stamp of the others, think of each child as a basket of links to a common memory area.

A memory page is only duplicated when it needs to be written to, hence "copy-on-write." The result is efficiency and decreased memory consumption. "ps" may report 7MB per child, but it might really only "cost" 900K of extra memory to add one more child. It is not correct to assume that 20 Apache children with a VSZ of 7MB each equals (2 x 7MB) of memory consumption -- the real answer is much, much lower. The same "copy-on-write" rules apply to all your MySQL children: 40 mysqld children @ 22MB each do not occupy 880MB of RAM.

The bottom line: although there is a cost to spawn extra httpd or mysqld children, that cost is not as great as "ps" would lead you to believe.

3.3. My webserver cannot handle all the traffic that my site receives, is there anything I can do?

If you have exhausted all the tuning possibilities on your existing server, it is probably time you evaluated the benefits of clustering two or more webservers together in a load-balanced fashion. In fact, users of such a setup are mod_log_sql's target audience!

3.4. What is the issue with activating delayed inserts?

INSERT DELAYED is a specific syntax to MySQL and is not supported by any other database. Ergo, why is it needed, and what MySQL deficiency is it working around? INSERT DELAYED is a kluge.

The MySQL documentation is unclear whether INSERT DELAYED is even necessary for an optimized database. It says, "The DELAYED option for the INSERT statement is a MySQL-specific option that is very useful if you have clients that can't wait for the INSERT to complete." But then it goes on to say, "Note that as MyISAM tables supports concurrent SELECT and INSERT, if there is no free blocks in the middle of the data file, you very seldom need to use INSERT DELAYED with MyISAM."

Because INSERT DELAYED returns without waiting for the data to be written, a hard kill of your MySQL database at the right (wrong?) moment could lose those logfile entries.

As of MySQL version 3.23.52, the error return functions disagree after a failed INSERT DELAYED: mysql_errno() always returns 0, even if mysql_error() returns a textual error. I have reported this bug to the MySQL folks. However, we have no way of knowing what solution they will adopt to fix this, and with the worst case solution mod_log_sql would not be able to tell if anything went wrong with a delayed insert.

Instead of delayed inserts, you may wish to utilize InnoDB tables (instead of the standard MyISAM tables). InnoDB tables suppot row-level locking and are recommended for high-volume databases.

If after understanding these problems you still wish to enable delayed inserts, section Optimizing for a busy database discusses how.

"How do I...?" -- accomplishing certain tasks

4.1. How do I extract the data in a format that my analysis tool can understand?

mod_log_sql would be virtually useless if there weren't a way for you to extract the data from your database in a somewhat meaningful fashion. To that end there's a Perl script enclosed with the distribution. That script (make_combined_log.pl) is designed to extract N-many days worth of access logs and provide them in a Combined Log Format output. You can use this very tool right in /etc/crontab to extract logs on a regular basis so that your favorite web analysis tool can read them. Or you can examine the Perl code to construct your own custom tool.

For example, let's say that you want your web statistics updated once per day in the wee hours of the morning. A good way to accomplish that could be the following entries in /etc/crontab:

# Generate the temporary apache logs from the MySQL database (for webalizer)
05 04 * * * root make_combined_log.pl 1 www.grubbybaby.com > /var/log/temp01
# Run webalizer on httpd log
30 04 * * * root webalizer -c /etc/webalizer.conf; rm -f /var/log/temp01

Or if you have a newer system that puts files in /etc/cron.daily etc., create a file called "webalizer" in the cron.daily subdirectory. Use the following as the contents of your file, and make sure to chmod 755 it when done.

#!/bin/sh
/usr/local/sbin/make_combined_log.pl 1 www.yourdomain.com > /var/log/httpd/templog
/usr/local/bin/webalizer -q -c /etc/webalizer.conf
rm -f /var/log/httpd/templog

See? Easy.

4.2.How can I log mod_usertrack cookies?

A number of people like to log mod_usertrack cookies in their Apache TransferLog to aid in understanding their visitors' clickstreams. This is accomplished, for example, with a statement as follows:

LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" \"%{cookie}n\""

Naturally it would be nice for mod_log_sql to permit the admin to log the cookie data as well, so as of version 1.10 you can do this. You need to have already compiled mod_usertrack into httpd -- it's one of the standard Apache modules.

First make sure you have a column called "cookie" in the MySQL database to hold the cookies, which can be done as follows if you already have a working database:

mysql> alter table acc_log_tbl add column cookie varchar(255);

Next configure your server to set usertracking cookies as follows, and make sure you include the new 'c' directive in your LogSQLTransferLogFormat, which activates cookie logging. Here's an example:

<VirtualHost 1.2.3.4>
 CookieTracking on
 CookieStyle Cookie
 CookieName Foobar
 LogSQLTransferLogFormat huSUsbTvRAc
 LogSQLWhichCookie Foobar
</VirtualHost>

The first three lines configure mod_usertrack to create a COOKIE (RFC 2109) format cookie called Foobar. The last two lines tell mod_log_sql to log cookies named Foobar. You have to choose which cookie to log because more than one cookie can/will be sent to the server by the client.

Recap: the 'c' character activates cookie logging, and the LogSQLWhichCookie directive chooses which cookie to log.

FYI, you are advised NOT to use CookieStyle Cookie2 -- it seems that even newer browsers (IE 5.5, etc.) have trouble with the new COOKIE2 (RFC 2965) format. Just stick with the standard COOKIE format and you'll be fine.

Perform some hits on your server and run a select

SELECT request_uri,cookie
FROM access_log
WHERE cookie IS NOT NULL;
Table 12:
request_uri cookie
/mod_log_sql/ ool-18e4.dyn.optonline.net.130051007102700823
/mod_log_sql/usa.gif ool-18e4.dyn.optonline.net.130051007102700823
/mod_log_sql/style_1.css ool-18e4.dyn.optonline.net.130051007102700823
4.3. What if I want to log more than one cookie? What is the difference between LogSQLWhichCookie and LogSQLWhichCookies?

As of version 1.17, you have a choice in how you want cookie logging handled.

If you are interested in logging only one cookie per request, follow the instructions in FAQ entry above. That cookie will be logged to a column in the regular access_log table, and the actual cookie you want to log is specified with LogSQLWhichCookie. Don't forget to specify the 'c' character in LogSQLTransferLogFormat.

If, however, you need to log multiple cookies per request, you must employ the LogSQLWhichCookies (note the plural) directive. The cookies you specify will be logged to a separate table (as discussed in section Logging many-to-one data in separate tables ), and entries in that table will be linked to the regular access_log entries via the unique ID that is supplied by mod_unique_id. Without mod_unique_id the information will still be logged but you will be unable to correlate which cookies go with which access-requests. Furthermore, with LogSQLWhichCookies, you do not need to include the 'c' character in LogSQLTransferLogFormat.

LogSQLWhichCookie and LogSQLWhichCookies can coexist without conflict because they operate on entireley different tables, but you're better off choosing the one you need.

4.4. What are the SSL logging features, and how do I activate them?
[Note]

You do not need to compile SSL support into mod_log_sql in order to simply use it with a secure site. You only need to compile SSL support into mod_log_sql if you want to log SSL-specific data such as the cipher type used, or the keysize that was negotiated. If that information is unimportant to you, you can ignore this FAQ.

By adding certain characters to your LogSQLTransferLogFormat string you can tell mod_log_sql to log the SSL cipher, the SSL keysize of the connection, and the maximum keysize that was available. This would let you tell, for example, which clients were using only export-grade security to access your secure software area.

You can compile mod_log_sql with SSL logging support if you have the right packages installed. If you already have an SSL-enabled Apache then you by definition have the correct packages already installed: OpenSSL and mod_ssl.

You need to ensure that your database is set up to log the SSL data. Issue the following commands to MySQL if your access table does not already have them:

mysql> alter table access_log add column ssl_cipher varchar(25);
mysql> alter table access_log add column ssl_keysize smallint unsigned;
mysql> alter table access_log add column ssl_maxkeysize smallint unsigned;

Finally configure httpd.conf to activate the SSL fields. Note that this is only meaningful in a VirtualHost that is set up for SSL.

<VirtualHost 1.2.3.4:443>
 LogSQLTransferLogFormat AbHhmRSsTUuvcQqz
</VirtualHost>

You also need to make sure you have the mod_log_sql_ssl module loaded as well.

The last three characters (Qqz) in the directive are the SSL ones; see section in the directives documentation for details of the LogSQLTransferLogFormat directive.

Restart Apache, then perform some hits on your server. Then run the following select statement:

SELECT remote_host,request_uri,ssl_cipher,ssl_keysize,ssl_maxkeysize
FROM access_log
WHERE ssl_cipher IS NOT NULL;
Table 13:
remote_host request_uri ssl_cipher ssl_keysize ssl_maxkeysize
216.192.52.4 /dir/somefile.html RC4-MD5 128 128
216.192.52.4 /dir/somefile.gif RC4-MD5 128 128
216.192.52.4 /dir/somefile.jpg RC4-MD5 128 128