Archive for the ‘Analysis’ Category.

Performance bottleneck symptoms

CPU Bottleneck Symptoms:

Symptoms for CPU bottlenecks include the following,
The Processor(_Total)\% Processor Time(measures the total utilization of your processor by all running processes) will be high. If the server typically runs at around 70% or 80% processor utilization then this is normally a good sign and means your machine is handling its load effectively and not underutilized. Average processor utilization of around 20% or 30% on the other hand suggests that your machine is underutilized and may be a good candidate for server consolidation using Virtual Server or VMWare.
Further to breakdown this %processor Time, monitor the counters – Processor(_Total)\% Privileged Time and Processor(_Total)\% User Time, which respectively show processor utilization for kernel- and user-mode processes on your machine. If kernel mode utilization is high, your machine is likely underpowered as it’s too busy handling basic OS housekeeping functions to be able to effectively run other applications. And if user mode utilization is high, it may be you have your server running too many specific roles and you should either beef hardware up by adding another processor or migrate an application or role to another box. The System\Processor Queue Length(indication of how many threads are waiting for execution) consistently greater than 2 or more for a single processor CPU is a clear indication of processor bottleneck . Also look at other counters like ASP\Requests Queued or ASP.NET\Requests Queued as well.

Tips to find out Application server bottlenecks:

  1. A high increase in application server processing time when the load is increased.
  2. One or more page components take more time when the same request db call is taking less execution time.
  3. The Static files are having less response time whereas the dynamic contents (servlets, jsp, etc) take more time.
  4. Network delay is negligible.
  5. Home Page gets displayed in few seconds even during the stress period(as it is fetched from the web server).
  6. Hits/sec & Throughput remains less.
  7. If the CPU/ Memory/Disk of the App server has any bottleneck symptoms.
  8. If the HTTP / HTTPS connections established doesn’t increase proportionally with the load.
  9. If the new connections established is very higher & the reused connections are very less.

Tips to find out Web server bottlenecks:

  1. Increased ‘Server Time’ breakup
  2. One or more page components of transaction takes more time where in the DB query is having less execution time.
  3. The static files are having high response time than the dynamic contents (servlets, jsp, etc)
  4. Network delay is negligible.
  5. Home Page takes more time for display.
  6. Hits/sec in the web server is very less.
  7. If the CPU/ Memory/Disk of the web server has any bottleneck symptoms.

Hardware Malfunctioning Symptoms:

  1. System\Context Switches/sec (measures how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode). If this counter suddenly starts increasing however, it may be an indicating of a malfunctioning device, especially if you are seeing a similar jump in the Processor(_Total)\Interrupts/sec counter on your machine.
  2. You may also want to check Processor(_Total)\% Privileged Time Counter and see if this counter shows a similar unexplained increase, as this may indicate problems with a device driver that is causing an additional hit on kernel mode processor utilization.
  3. If Processor(_Total)\Interrupts/sec does not correlate well with System\Context Switches/sec however, your sudden jump in context switches may instead mean that your application is hitting its scalability limit on your particular machine and you may need to scale out your application (for example by clustering) or possibly redesign how it handles user mode requests. In any case, it’s a good idea to monitor System\Context Switches/sec over a period of time to establish a baseline for this counter, and once you’ve done this then create a perfmon alert that will trigger when this counter deviates significantly from its observed mean value.

Memory Bottleneck Symptoms:

When it comes to the System memory, there are 3 things to monitor:

  1. Monitor Cache (Hits/Misses),
  2. Monitor Memory (Memory Available/sec, Process/Working Set),
  3. Monitor Paging (Pages Read/Sec, Pages Input/Sec, Page Faults/Sec, % Disk Processing)Memory\Available Bytes,

If this counter is greater than 10% of the actual RAM in your machine then you probably have more than enough RAM and don’t need to worry. The Memory\Pages/sec counter indicates the number of paging operations to disk during the measuring interval, and this is the primary counter to watch for indication of possible insufficient RAM to meet your server’s needs. You can monitor Process(instance)\Working Set for each process instance to determine which process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. A related counter is Memory\Cache Bytes, which measures the working set for the system i.e. the number of allocated pages kernel threads can address without generating a page fault. Finally, another corroborating indicator of insufficient RAM is Memory\Transition Faults/sec, which measures how often recently trimmed page on the standby list are re-referenced. If this counter slowly starts to rise over time then it could also indicating you’re reaching a point where you no longer have enough RAM for your server to function well.

Disk Bottleneck Symptoms:

A bottleneck from a disk can significantly impact response time for applications running on your system. Physical Disk (instance)\Disk Transfers/sec counter for each physical disk and if it goes above 25 disk I/Os per second then you’ve got poor response time for your disk. By tracking Physical Disk(instance)\% Idle Time, which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you’ve likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. In this case it’s time to upgrade your hardware to use faster disks or scale out your application to better handle the load. Look for the Physical Disk (instance)\Average Disk Queue length & Physical Disk (instance)\Current Disk Queue length parameters to get more details on the queued up requests.

Network Performance/Bottlenecks:

The first step in monitoring is to monitor the network performance, to make sure your network performance is good. There are some simple ways to do so. First monitor whether you are getting the same bandwidth which you are supposed to get. The easiest way to find out is to check the current bandwidth counter with your expected bandwidth. Also verify the rate at which the server sends and receives the data. Network performance depends on 2 factors, network cards and interfaces (Switches/Routers) configured on the servers.

Here are some of the counters to find network bottlenecks:

Network Interface: Current Bandwidth
This counter determines your current bandwidth of the network interface. Capture this counter value and correlate with bytes receives/sec, bytes send/sec and bytes total/sec.
If the bytes total should be at least half of your total bandwidth .If not so then we can confirm a network bottle neck

Network Interface: Bytes Total/sec:
To determine if your network connection is creating a bottleneck, compare the Network Interface: Bytes Total/sec counter to the total bandwidth of your network adapter card. To allow headroom for spikes in traffic, you should usually be using no more than 50 percent of capacity. If this number is very close to the capacity of the connection, and processor and memory use are moderate, then the connection may well be a problem. To determine the network utilization (throughput on a server’s network cards), you can check the following counters:

  1. Network\Bytes Received/sec
  2. Network\Bytes Sent/sec
  3. Network\Bytes Total/sec
  4. Network Current Bandwidth

If the total byte per second value is more than 50 percent of the total network utilization under average user/work load, then your server is having some problems under peak load conditions. Make sure you compare network counter values with Physical Disk\% Disk Time and Processor\% Processor Time utilization. If the disk time and processor time values are low but the network values are very high, there might be a problem with your network.
There are 2 ways to solve this problem:

  1. By optimizing the network card settings
  2. By adding an additional network card.

Analyzing IIS logs with LogParser

When users access your server running IIS, IIS logs the information. The logs provide valuable information that you can use to identify any unauthorized attempts to compromise your Web server.
Depending on the amount of traffic to your Web site, the size of your log file (or the number of log files) can consume valuable disk space, memory resources, and CPU cycles. You might need to balance the gathering of detailed data with the need to limit files to a manageable size and number. Logging information in IIS goes beyond the scope of the event logging or performance monitoring features provided by Windows. The IIS logs can include information, such as who has visited your site, what the visitor viewed, and when the information was last viewed.

IIS log file format:

IIS log file format is a fixed (meaning that it cannot be customized) ASCII format. This file format records more information than other log file formats, including basic items, such as the IP address of the user, user name, request date and time, service status code, and number of bytes received. In addition, IIS log file format includes detailed items, such as the elapsed time, number of bytes sent, action (for example, a download carried out by a GET command), and target file. The IIS log file is an easier format to read than the other ASCII formats because the information is separated by commas, while most other ASCII log file formats use spaces for separators. Time is recorded as local time.

IIS log file location:

The IIS logs provide a great deal of information about the activity of a Web application. You can find the IIS logs in

systemroot\System32\LogFiles\W3SVCnumber, where number is the site ID for the Web site.

LogParser:

LogParser is a command line utility. The default behaviour of LogParser is it works like a “data processing pipeline”, by taking an SQL expression on the command line, and outputting the lines containing matches for the SQL expression.

Download LopParser from:
http://www.microsoft.com/downloads/en/details.aspx?familyid=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en

W3C Extended Logging Field Definitions


Prefix Meaning
s- Sever actions
c- Client actions
cs- Client-to-server actions.
sc- Server-to-client actions.


Field Appears As Description
Date date The date that the activity occurred.
Time time The time that the activity occurred.
Client IP Address c-ip The IP address of the client that accessed your server.
User Name cs-username The name of the authenticated user who accessed your server. This does not include anonymous users, who are represented by a hyphen (-).
Service Name s-sitename The Internet service and instance number that was accessed by a client.
Server Name s-computername The name of the server on which the log entry was generated.
Server IP Address s-ip The IP address of the server on which the log entry was generated.
Server Port s-port The port number the client is connected to.
Method cs-method The action the client was trying to perform (for example, a GET method).
URI Stem cs-uri-stem The resource accessed; for example, Default.htm.
URI Query cs-uri-query The query, if any, the client was trying to perform.
Protocol Status sc-status The status of the action, in HTTP or FTP terms.
Win32® Status sc-win32-status The status of the action, in terms used by Microsoft Windows®.
Bytes Sent sc-bytes The number of bytes sent by the server.
Bytes Received cs-bytes The number of bytes received by the server.
Time Taken time-taken The duration of time, in milliseconds, that the action consumed.
Protocol Version cs-version The protocol (HTTP, FTP) version used by the client. For HTTP this will be either HTTP 1.0 or HTTP 1.1.
Host cs-host Displays the content of the host header.
User Agent cs(User-Agent) The browser used on the client.
Cookie cs(Cookie) The content of the cookie sent or received, if any.
Referrer cs(Referer) The previous site visited by the user. This site provided a link to the current site.



The following is an example of a record in the extended log format that was produced by the Microsoft Internet Information Server (IIS):
——————————————————————————–
#Software: Microsoft Internet Information Server 6.0
#Version: 1.0
#Date: 2011-05-09 22:48:39
#Fields: date time c-ip cs-username s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken cs-version cs(User-Agent) cs(Cookie) cs(Referrer)

2011-05-09 22:48:39 192.168.1.5 – 173.201.216.31 /GreenBlue.jpg – 200 540 324 157 HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+95) USERID=CustomerA;+IMPID=01234 http://www.punebids.com

Procedure:

  1. First note the date and timings of the test run.
  2. Collect the IIS log files of that particular timeframe from the Web Servers (specific to your Web Application) and put it in a single folder
  3. Write query for data collection from logs and run it in LogParser

To find URL Hit count:


1
logparser “SELECT cs-uri-stem AS Url, count(cs-uri-stem) AS Hits FROM ‘C:\Logs\WebServers1-4\*.*’ WHERE date=to_timestamp(2011-05-10,’yyyy-MM-dd’) and time>01:55:00’ and time<02:35:00’ and cs-uri-stem like ‘%asp%’ GROUP BY Url ORDER BY Hits DESC” –i:IISW3C –o:csv >”C:\Logs\CollectedUrlHits.txt



Output:

Url,Hits

/Framework/website/Default.aspx,15678

/isvs/consulting/userprofile.asp,897

/DNA/Common/Portal/ClientHome.aspx,75

/DNA/Common/Clients/Clientlist.aspx,75

/DNA/Common/portal/DNAuserlist.aspx,75

Statistics:

Elements Processed: 245646

Elements output: 5

Execution time: 5.80 seconds

To find the HTTP Error count:


1
logparser “SELECT cs-uri-stem AS Url, count(cs-uri-stem) AS Hits  FROM ‘C:\Logs\WebServers1-4\*.*’ WHERE date=to_timestamp(2011-05-10,’yyyy-MM-dd’) and time>01:55:00’ and time<02:35:00’ and cs-uri-stem like ‘%asp%’ and sc-status=500  GROUP BY Url ORDER BY Hits DESC” –i:IISW3C –o:csv >”C:\Logs\Collected500Errors.txt


Output:

Url,Hits

/Framework/website/Default.aspx,1

/isvs/consulting/userprofile.asp,2

/DNA/Common/Portal/ClientHome.aspx,1

Statistics:

Elements Processed: 245646

Elements output: 3

Execution time: 3.80 seconds





P.S. : Thanks to Rahul D.