Wednesday, August 3, 2011

NTLM for PL/SQL

NTLM, or more properly NTLMSSP is a protocol used on Microsoft Windows system as part of the so-called Integrated Windows Authentication.

Integrated Windows Authentication is also known as HTTP Negotiate authentication, NT Authentication, NTLM Authentication, Domain authentication, Windows Integrated Authentication, Windows NT Challenge/Response authentication, or simply Windows Authentication.



In Microsoft Internet Information Server (IIS), the system administrator can protect a website or folder with "Integrated Windows Authentication". When you browse to this website or folder, you must enter your Windows (domain) username and password to get access (although Internet Explorer will, depending on your security settings, send your credentials automatically without showing a login dialog box). Note that unlike Basic Authentication, which sends the password as plaintext to the web server, the NTLM protocol does not send the password but rather performs a cryptographic "handshake" with the server to establish your identity.

Use of Integrated Windows Authentication via NTLM on IIS is very common inside many companies (ie on intranets and internal web servers), where both the client and web server computers are part of the same, or trusting, domains.


Using NTLM from PL/SQL with UTL_HTTP

Unfortunately, from the PL/SQL developer's perspective, Oracle's UTL_HTTP package does not support NTLM authentication (it only supports Basic authentication via the SET_AUTHENTICATION procedure).

So, if you wanted to retrieve information from your intranet or call a web service (protected by Integrated Windows Authentication) from the database via PL/SQL and UTL_HTTP, you were out of luck.

Until now, that is... :-)


A pure PL/SQL implementation of the NTLM protocol

I came across a Python implementation of the NTLM protocol, and I decided that it should be possible to port this code to PL/SQL. Assisted by a couple of good friends and colleagues, and after a lot of bit-fiddling, reverse-engineering, study of protocol specifications, and liberal use of network packet sniffers, we got it working!

A pure PL/SQL implementation of the NTLM protocol is now available and included in the Alexandria Utility Library for PL/SQL.

The code is organized into two packages: NTLM_UTIL_PKG, which contains protocol-specific functions, and NTLM_HTTP_PKG, which is the package you actually use to make HTTP callouts, and which handles the initial NTLM "handshaking" with the web server.


Example 1: Simple request

This code simply grabs the page you direct it towards, and returns the contents as a CLOB. (What is really going on behind the scenes is a series of requests and responses to establish the authenticated connection, before the actual URL contents is served.)


declare
  l_clob clob;
begin
  debug_pkg.debug_on;
  l_clob := ntlm_http_pkg.get_response_clob('http://servername/page', 'domain\username', 'password');
  debug_pkg.print(substr(l_clob, 1, 32000));
end;




Example 2: Web service call

Here, a (persistent) connection is explicitly established before making one or more requests to the server. Note the returnvalue from the BEGIN_REQUEST function, which is the authorization string which must be passed along in the "Authorization" HTTP header on any subsequent requests. The connection is then is closed. Note that NTLM is a connection-based protocol, and will not work without the use of persistent connections.



declare
  l_url           varchar2(2000) := 'http://servername/page';
  l_ntlm_auth_str varchar2(2000);
  l_xml           xmltype;
  l_soap_env      clob := 'your_soap_envelope_here';
  
begin
  debug_pkg.debug_on;


  -- perform the initial request to set up a persistent, authenticated connection
  l_ntlm_auth_str := ntlm_http_pkg.begin_request (l_url, 'domain\username', 'password');


  -- pass authorization header to next call(s)
  apex_web_service.g_request_headers(1).name := 'Authorization';
  apex_web_service.g_request_headers(1).value := l_ntlm_auth_str;


  -- perform the actual call
  -- NOTE: for this to work, you must be using a version of apex_web_service that allows persistent connections (fixed in Apex 4.1 ???)
  --       see http://jastraub.blogspot.com/2008/06/flexible-web-service-api.html?showComment=1310198286769#c8685039598916415836
  l_xml := apex_web_service.make_request(l_url, 'soap_action_name_here', '1.1', l_soap_env);


  -- or use the latest version of flex_ws_api
  -- flex_ws_api.g_request_headers(1).name := 'Authorization';
  -- flex_ws_api.g_request_headers(1).value := l_ntlm_auth_str;
  -- l_xml := flex_ws_api.make_request(l_url, 'soap_action_name_here', '1.1', l_soap_env);


  -- this will close the persistent connection
  ntlm_http_pkg.end_request;


  debug_pkg.print('XML response from webservice', l_xml);
end;




Remarks


  • Tested successfully on Oracle 10g XE (with AL32UTF8 character set) and Oracle 10g EE (with WE8MSWIN1252 character set).
  • Tested successfully against IIS 6.0 with non-SSL "plain" website and SSL-enabled Sharepoint website (both set up with Integrated Windows Authentication, obviously).
  • The current version ignores cookies when setting up the connection. If you depend on cookies being present, you may have to deal with this specifically.


Given the diverse nature of network configuration, there may be bugs or unhandled cases in the code. So please test the code in your environment and leave a comment below, letting me me know if it works for you or not.

New version of Alexandria Utility Library for PL/SQL

I've just uploaded a new version of the Alexandria Utility Library for PL/SQL.

Updates include both small bug fixes and some major new features (which I'll return to in another post).

Among the improvements are:


  • Additional functions in OOXML_UTIL_PKG for working with Excel 2007 and Powerpoint 2007 files.
  • Kris Scorup has contributed improved CSV parsing to the CSV_UTIL_PKG. It now handles double quotes and separator characters inside strings.
  • Anton Scheffer's packages for building PDF and XLSX files have been included in the library.
  • The PL_FPDF library by Pierre-Gilles Levallois is a port of the FPDF library for PHP. Pierre-Gilles Levallois has his own website, but his package (which is open source under the GNU license) does not appear to have been updated for several years, and several indivuals have been making their own fixes and enhancements to this package. Rob Duke and Brian McGinity have both contributed bug fixes and enhancements (such as internal document links, Javascript support, using clobs to work around the 32k limit, etc.). These changes as well as some of my own have been merged and included in the Alexandria library as PDFGEN_PKG. You'll find this package in the "extras" folder (it is not installed by default if you run the install script).

Friday, July 29, 2011

Count lines in multiple files using Windows command prompt

Not really Oracle-related, but I'm posting this as a reminder to myself and possibly useful to others.

To count the number of lines in a given set of files using the Windows command prompt, do the following:

for %G in (*.sql) do find /c /v "_+_" %G

This invokes the "find" command once for each file, counting the lines that do NOT contain the string "_+_" (the string has no special significance, any weird string that would not occur "naturally" in the files can be used).

There are probably more sophisticated ways of doing this, perhaps using PowerShell and whatnot. Leave a comment if you know of other methods (something that adds together all the file line counts into one grand total would be even better).

Mythbusters: Stored Procedures Edition



These days, the use of database stored procedures is regarded by many as a bad practice.

Those that dislike stored procedures tend to regard them as incompatible with the three-tier architecture:

By breaking up an application into tiers, developers only have to modify or add a specific layer, rather than have to rewrite the entire application over. There should be a presentation tier, a business or data access tier, and a data tier.

This is illustrated as follows:



Note that the "tiers" in the figure should actually be labelled "layers", for as the accompanying Wikipedia article says:
The concepts of layer and tier are often used interchangeably. However, one fairly common point of view is that there is indeed a difference, and that a layer is a logical structuring mechanism for the elements that make up the software solution, while a tier is a physical structuring mechanism for the system infrastructure. 

In fact, those that argue that stored procedures are bad tend to equate the three logical layers with three physical tiers:


  1. Data layer = data tier (database)
  2. Logic layer = middle tier (application server)
  3. Presentation layer = presentation tier


But if we accept the above definition of "layers" and "tiers", it is obvious that the following is a valid mapping as well:


  1. Data layer = data tier (database)
  2. Logic layer = data tier (database)
  3. Presentation layer = presentation tier


In other words, the database becomes our "logic layer" through the use of database stored procedures, which, as the name implies, are physically stored (and executed) in the database. (And although I use the term "stored procedure", I'm primarily talking about Oracle and PL/SQL, where the PL/SQL code should be put in packages rather than stand-alone procedures.)

But why is this a bad idea? In fact, as it turns out, it might not be a bad idea at all. The usual reasons given against the use of stored procedures for "business logic" (or for anything at all, really) tend to be myths (or outright lies), repeated so many times that they are taken as the truth.

So let's bust these myths, once and for all. And whenever someone argues against stored procedures using one of these myths, just give them a link to this blog post. (And leave comments to prove me wrong, if you will.)


Myth #1: Stored procedures can't be version controlled



Stored procedure code lives in text files, which can be version controlled like any other piece of code or document. Storing/compiling the code in the database is just like (re-)deploying any other code.

Claiming that stored procedures cannot be version controlled (because they are in the database) is like saying your application source code (Java, C# or whatever) cannot be version controlled because it is compiled and deployed to an application server.




Myth #2: Managing the impact of changes in the database is hard



Databases such as Oracle have built-in fine-grained dependency tracking.


A wealth of information about your code is exposed via data dictionary views.




Myth #3: Database tools lack modern IDE features





There are a number of free and commercial PL/SQL code editors and IDEs, and all have various levels of syntax highlighting, code insight and refactoring support.




Myth #4: Stored procedures always result in spaghetti code



To this, I can only say that bad programmers can make pasta in any language (the above is a visual representation of a Java or .NET enterprise framework "several dozen megabytes chock full of helper classes like IEnterpriseAuthenticationProviderFactoryManagementFactory").

And a good programmer can create "beautiful" code in COBOL, Visual Basic, PHP... and any stored procedure language, for that matter.





Myth #5: Code in the database can’t be properly encapsulated and reused, you need an object-oriented language for that






PL/SQL packages, views, pipelined functions and Ref Cursors offer encapsulation and reuse. And PL/SQL has object-oriented features, too.




Myth #6: Stored procedure languages are primitive, they lack basic features such as exception handling and dynamic execution



PL/SQL has had proper exception handling from the start, over 20 years ago (although exception handling was only introduced to SQL Server in 2005).



DBMS_SQL, EXECUTE IMMEDIATE and "weak" Ref Cursors enable dynamic execution of code. Parameter overloading and the ANYDATA and ANYTYPE types allow for generic code to be written.




Myth #7: Debugging stored procedures is hard/impossible



Both Oracle and SQL Server have built-in debugging capabilities, exposed via graphical user interfaces in the common IDEs, with full support for stepping though code, inspecting values, etc.




Myth #8: Stored procedures can't be unit tested



There are a number of free and commercial unit testing frameworks available for PL/SQL. Steven Feuerstein, one of the world's leading experts on the Oracle PL/SQL language, has been preaching the importance of unit testing in the database for years, and has developed several of the available unit testing frameworks.




Myth #9: Stored procedures are not portable, and tie you to one platform

This is the "vendor lock-in" argument. But the fact is that PL/SQL runs on multiple databases.

Such as DB2:

"IBM DB2 9.7 for Linux, UNIX, and Windows has out-of-the-box support for Oracle's SQL and PL/SQL dialects. This allows many applications written against Oracle to execute against DB2 virtually unchanged."
And Postgres (EnterpriseDB):

"Postgres Plus Advanced Server implements a comprehensive suite of Oracle-compatible functionality within and around the core PostgreSQL engine, including: (...) Oracle SQL syntax and semantics, Functions and Packages, PL/SQL (extensive support)"
Add to this the fact that the Oracle database runs on more operating systems than any other database, which means that your PL/SQL code will seamlessly transfer from Windows to Unix to Linux-based systems.



So PL/SQL-based code can actually be said to be more portable than, for example, .NET code (despite the existence of Mono). There are very few truly portable technologies; even Java is "write once, debug everywhere".




Myth #10: It's stupid/dangerous to put business logic in the database

This claim is usually made without any specific reason as to why it is stupid or dangerous. It usually "just is", because it is "against best practice" and "everybody else is putting the business logic in the middle tier". Sometimes it is claimed that putting logic in the database "mixes concerns", which must be a bad thing.

The problem with "business logic" is that nobody has a clear definition of what it is (but "you'll know it when you see it"). For example, where do you draw the line between "data logic" and "business logic"? Primary keys, foreign key constraints, unique key constraints, not null constraints, check constraints -- are these "data logic" or "business logic"? "Discount must be between 0% and 5%", is that a business rule or a data constraint, and/or is it a validation rule in the presentation layer?

The fact is, if you move ALL your logic into stored procedures, you entirely avoid the "mixing of concerns" between the data tier and the logic tier. (And if you think such an approach dooms your project to failure, consider the next myth, which features an example of a massive [and wildly successful] application written entirely in the database.)



Oh, and by the way, if your business logic is somewhere else than in the database, you always run the risk of someone or something bypassing your middle tier (for example by logging in with SQL*Plus), directly updating the database and possibly corrupting the data.

So let's turn this around and conclude instead that:

"If your business logic is not in the database, it is only a recommendation."



Myth #11: Stored procedures can't scale

A frequent argument against stored procedures is that by placing all the work in the database server, your solution won't be able to scale up, because you need "application servers" in the middle tier to do that. The scalability of the database is limited by the fact that you can only have a single database server (or you need to rewrite your code to work with partitioned/sharded databases like Facebook have done).

Of course, a lot of the people who throw around this kind of argument have never worked on an application or website which needed to scale up to millions of users (and to be clear, neither have I). That's because the vast majority of us work on much smaller enterprise business systems or "normal" websites (perhaps even the kind of website that can be well served with free database software on a server with less juice than your laptop).

But stored procedures CAN scale. It's only a matter of money. And if you have millions of users, you should be able to afford decent hardware.

Let's use Oracle Application Express (Apex) as an example of a big and complex PL/SQL application:

"[Application Express] lives completely within your Oracle database. It is comprised of nothing more than data in tables and large amounts of PL/SQL code. The essence of Oracle Application Express is approximately 425 tables and 230 PL/SQL packages containing 425,000+ lines of code."

This PL/SQL application can be deployed anywhere from your laptop to any kind of server:



The biggest and fastest server you can buy is currently Oracle Exadata.

"An 8 rack configuration has a raw disk capacity of 3,360 TB and 1,680 CPU cores for SQL processing. Larger configurations can be built with additional InfiniBand switches."

Oracle makes bold claims about this machine:

"Oracle claims that (..) two Exadata database systems would be able to handle Facebook’s entire computing load."

It's hard for me to verify that claim, not being associated with neither Oracle nor Facebook, but let's assume it has at least some truth to it.

So what about running our "stored procedure" application on Exadata?

"Does APEX Work on Exadata?
"Yep, Exadata is just Oracle. Let me say that again: It’s just Oracle. Exadata runs an 11.2 Database on Linux x64. It’s the exact same binary install if you download those binaries for “generic” Linux x64 11.2 from OTN. So, if your code / app runs on 11.2, it runs on Exadata. (..) The APEX Dev Team (my old friends and colleagues) did absolutely nothing to port their code to Exadata. I've run a number of customer benchmarks with customer's data and queries and have yet to make a single change to their queries or structures to make them work on Exadata."

So... without changing a single of those 425,000 lines of code, this "stored procedure" application can run on my old laptop (I've even tried it on an Asus EEE netbook), or it can run with 1,680 CPU cores. Without offloading any logic to an application server.

I'd say that's pretty scalable.

Tuesday, May 24, 2011

Mobile device support in Apex 4.1

The current Apex Statement of Direction for Apex 4.1 states that it will "include themes and HTML templates suitable for smart phones and mobile devices".

If you are wondering what that means, then check out this thread on the Apex OTN Forum where Marc Sewtz, one of the developers on the Apex team, provides more details about this new feature.

Interestingly, some of these "mobile-enabling" features are also relevant for regular applications, such as the ability to render a form without a table grid, enhanced label templates, and dynamic (SQL-based) lists.

Thursday, April 14, 2011

Just give me a hash table and a shitload of RAM

In this interview, James Gosling, the "father of Java", says:

I’ve never got it when it comes to SQL databases. It’s like, why? Just give me a hash table and a shitload of RAM, and I’m happy. And then you do something to deal with failures.

Right... just "do something to deal with failures". And maybe add some other useful stuff. Shouldn't take too long to implement and make sure it works as intended.

But Mr. Gosling, could you please give me a cost estimate and a delivery date for the data-centric business application I want you to build for me?

Monday, March 28, 2011

Amazon S3 API for PL/SQL



Amazon S3 is part of Amazon's Web Service offering and the name is an abbreviation for Simple Storage Service:
"Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers."
A few months ago, Jason Straub published an Oracle whitepaper on how to integrate Oracle Application Express (Apex) with Amazon S3.

As Jason points out, Amazon has a Free Usage Tier which allows you to get started using Amazon S3 for free. If you have ever bought a book from Amazon, they already have your credit card on file, so signing up for Amazon Web Services is quick and easy (and they won't start charging your credit card until the free trial period is over).

Introducing the S3 API for PL/SQL

Inspired by Jason's whitepaper, I decided to write a stand-alone PL/SQL API for Amazon S3. This API can be used in any PL/SQL solution, with or without Apex.

The API supports all common S3 operations, including

  • Authentication
  • Creating new buckets
  • Listing existing buckets
  • Listing existing objects (with or without filtering)
  • Creating (uploading) new objects (and setting an Access Control List - ACL)
  • Generating download links (with or without expiry dates)
  • Downloading objects
  • Deleting objects


See the examples below for more details.

Use Cases

So what can you do with Amazon's S3 service in combination with a PL/SQL API?

I can think of several interesting use cases, some of which I might explore further in future posts:

  • Backing up your database (use DBMS_DATAPUMP to dump a file to disk, then compress it using ZIP_UTIL_PKG, then encrypt it using CRYPTO_UTIL_PKG, and upload it to S3)
  • Backing up your PL/SQL source code (use data dictionary views or DBMS_METADATA to extract the source code, optionally zip and/or encrypt it, and upload to S3)
  • Backing up your Apex applications (use WWV_FLOW_UTILITIES.EXPORT_APPLICATION_TO_CLOB to generate export file, optionally zip and/or encrypt it, and upload to S3)
  • Cloud storage for file uploads (instead of storing [large] files inside your database, store them in the cloud and download them on demand -- especially relevant for Oracle XE which has a file size limit)
  • Serve static content (generate static [text, CSV, HTML, PDF] files from the database and upload to S3)
  • Replication or shared storage (upload from one machine/database, download to another)
  • Data loading or message processing (set up to poll for new incoming files - uploaded by other S3 clients - and process them)

Remember that all these things can be scheduled to run in the database using DBMS_JOB or DBMS_SCHEDULER.


Where to get the Amazon S3 API for PL/SQL

You can download the API as part of the Alexandria Utility Library for PL/SQL.


Getting started

Download and compile the relevant PL/SQL API packages. Then register with Amazon for an S3 account and get your AWS keys (key and secret key), and login to the AWS Management Console to get familiar with the basic operations.



If you are unfamiliar with Amazon S3, I recommend that you read this short getting started guide that describes the common operations.

In the following examples we shall see how you can do the same operations using PL/SQL.


Authentication

From your Amazon account control panel, you'll get the key strings you need to use the Amazon web services.

Before you call any of the following API methods, you must initialize the authentication package. You only have to do this once per database session (but remember, on the web, every page view is a separate database session, so in Apex you'll need to run this code for every page, typically as a Before Header page process).



Creating new buckets

Buckets are what you use to organize your objects (files) in Amazon S3. Think of them as top-level folders, but note that you cannot create more than 100 buckets in a single account, and the bucket name must be unique across all user accounts on Amazon S3. So creating buckets is not really something you'd do very often, and usually it will be done manually (to resolve any name conflicts with existing buckets).

A bucket is associated with a specific region where your objects will be stored. For reasons of latency/speed and possibly legal issues, it makes sense to select a region that's close to you and your users (although you may actually want to locate it far away if the purpose is backup for a major disaster in your own area).

Here's how to create a new bucket via PL/SQL code:


Checking the AWS management console to verify that the bucket has indeed been created (in the specified region):



Listing existing buckets

With one or more buckets created in your account, you can list the bucket names.

There are two way to do this, either by retrieving an index-by PL/SQL table using the GET_BUCKET_LIST function:



or, alternatively, via SQL using a pipelined function named GET_BUCKET_TAB:



Creating (uploading) new objects

An "object" is a file, and this is really what the S3 service is all about, storing files. So let's upload a file or two to our new bucket!

The API lets you upload any BLOB data to S3 using the NEW_OBJECT procedure.



When you upload a file to S3, the default Access Control List (ACL) makes sure that only the owner of the file (you!) can access (download) it.

Others get an "Access Denied" message (but see the "Generating download links" section for how to generate special time-limited download links):



There are a number of predefined ACLs that you can specify if, for example, you want to make the file publicly available.



Which can then be freely downloaded by anyone (the use of HTTPS is optional).


A note about "folders": S3 has no concept of "folders", but you can simulate folders by using a forward slash in your file names (as seen in the previous example). Some S3 clients, such as the AWS management console, will present such files in a folder structure. As far as the PL/SQL API is concerned, the slash is simply part of the file name and has no special meaning.

Listing existing objects

Now that we have uploaded a couple of files, we can list the contents of the bucket via the GET_OBJECT_LIST function:



You can also get a list in SQL via a pipelined function named GET_OBJECT_TAB:



In both cases, you can optionally specify a prefix that acts as a search filter for the file names you want to return, and/or the maximum number of items you want to return.



Generating download links

You can access a file that has been protected by an ACL by including a special checksum parameter in the URL.

The GET_DOWNLOAD_URL function lets you generate the URL needed to access the file. You can specify when the link should expire, so this means you can share a download link that will stop working after a specified amount of time, which can obviously be useful in a number of scenarios.



Pasting the generated URL into the browser allows us to access the file:



Downloading objects

Downloading a file from S3 using PL/SQL is straightforward with a call to the GET_OBJECT function which returns a BLOB:



Deleting objects

Removing a file is likewise very simple, just call the DELETE_OBJECT procedure:




Summary

The ability to upload and download any file from the Oracle database to "the cloud" (and vice versa) via PL/SQL is extremely useful for a number of purposes.

Let me know if you find this API useful!

Monday, March 14, 2011

How I hacked the Apex 4 Websheets

Or, to be more specific (and less sensationalistic), how I struggled to, and finally succeeded in, using the new Apex 4 Websheets feature in a "Runtime-Only" environment.

What is an Apex runtime environment?

"Oracle recommends that you run any sensitive production Oracle Application Express applications with a runtime installation of Oracle Application Express. A runtime installation does not expose the Web-based application development environment, thus preventing the use of Application Builder, SQL Workshop, and related utilities on a production installation. Additionally, a runtime environment only includes the Oracle Application Express database objects and privileges necessary to run applications, making it a more hardened environment."

So, naturally, I set up Apex production environments using the runtime-only installation. But I hit a problem when I tried to deploy a Websheet application to such a production environment.

The setup

Here is what I did:

  1. Installed Oracle XE
  2. Installed Apex 4 runtime (apxrtins.sql)
  3. Manually created an application schema (SMALL_APPS) in the database
  4. Manually created the Websheet tables (APEX$ tables) in the SMALL_APPS schema (to do this I had to manually export the table scripts from the Apex development instance using TOAD, but I guess it could also have been done through the SQL Workshop in Apex itself)
  5. Manually created an Apex workspace for the SMALL_APPS schema (via the APEX_INSTANCE_ADMIN package)
  6. Exported the Workspace definition from the development instance. The generated workspace script contains the statements to create the workspace users. I removed the part of the script that creates the workspace itself (as I had already done this in the preceding step), and changed the workspace ID to the newly created workspace ID before I ran the script.


So far, so good.

The problem

I then proceeded to import the actual Websheet application.

This, however, threw up the following error:

WEBSHEET APPLICATION 112 - WebSheetSandbox
Set Credentials...
Check Compatibility...
API Last Extended:20100513
Your Current Version:20100513
This import is compatible with version: 20100513
COMPATIBLE (You should be able to run this import without issues.)


Set Application ID...
...Remove Websheet Application
...Create Websheet Application
...Create Access Control List
...Create Application Authentication Set Up
...Create Data Grid
Rollback

Error starting at line 163 in command:
declare
  q varchar2(32767) := null;
begin
q := null;
wwv_flow_api.create_ws_worksheet (
  p_id => 1311502709921962+wwv_flow_api.g_id_offset,
  p_flow_id => 4900,
  p_page_id => 2,

(snip...)

ORA-02291: integrity constraint (APEX_040000.WWV_FLOW_WORKSHEETS_FLOW_FK) violated - parent key not found
ORA-06512: at "APEX_040000.WWV_FLOW_API", line 14562
ORA-06512: at line 5
02291. 00000 - "integrity constraint (%s.%s) violated - parent key not found"
*Cause:    A foreign key value has no matching primary key value.
*Action:   Delete the foreign key or add a matching primary key.


From this it was evident that the data grids in the exported application are linked to application 4900, which is the built-in Websheets application. However, the Apex runtime-only installation does not install application 4900, hence the integrity error.

At this point I started to wonder if Websheets are supported in a runtime-only installation of Apex, and I posted the question in the Apex discussion forum on OTN.

But the only answer there was silence, so after a few days I decided to just go ahead and try to install application 4900 in the runtime environment, by running the f4900.sql script (from the apex\builder folder).

More problems

With application 4900 installed successfully, I was able to install my own Websheets application.

However, after login I was greeted with the following error message:

ORA-06550: line 9, column 46: PLS-00201: identifier 'WWV_FLOW_F4000_PLUGINS.RENDER_SEARCHBOX' must be declared ORA-06550: line 9, column 1: PL/SQL: Statement ignored

So, a package is missing. I located the package scripts (in the apex\core folder).

The comments in the package header (wwv_flow_f4000_plugins.sql) actually state:

"RUNTIME DEPLOYMENT: YES"

But evidently the package is not installed by the runtime installation script, so either this is a bug or the comment is wrong.

So I added the missing package by running:

alter session set current_schema = APEX_040000;


@wwv_flow_f4000_plugins.sql
@wwv_flow_f4000_plugins.plb

The final problem

This fixed the PLS-00201 error, but now I got several other errors, all similar to the following:

Unable to bind ":WS_APP_ID"
Unable to bind ":WEBPAGE_ID"
ORA-20001: run_query error q=select id, title, content, section_type, data_grid_id, report_id, data_section_style, nav_start_webpage_id, nav_max_level, nav_include_link, created_by from apex$_ws_webpg_sections where ws_app_id = :WS_APP_ID and webpage_id = :WEBPAGE_ID order by display_sequence
ORA-01003: no statement parsed


Still no feedback in the discussion forum, so I decided to dig deeper into the Apex internals (an interesting exercise in itself).

A solution!

In the end, the solution turned out to be simple. The WWV_FLOW_COMPANY_SCHEMAS table contains the workspace to schema mappings. This table contains a column called IS_APEX$_SCHEMA, and this needs to be set to "Y" (the APEX_INSTANCE_ADMIN.ADD_WORKSPACE procedure leaves the column value as NULL).

So just update the column to enable Websheets:

update wwv_flow_company_schemas
set is_apex$_schema = 'Y'
where schema = 'SMALL_APPS';

Voila! I now have a working Websheet application in my runtime-only Apex environment.


Postscript

While I was typing up this for the blog post, I stumbled across the following statement in the Apex Administration Guide:

"Tip: Websheets are not supported in an Oracle Application Express runtime environment."

I wish somebody could have pointed that out to me in the discussion forum thread. But then again, if they did, I probably wouldn't have discovered how to make it work anyway.

And if anyone from Oracle is reading this, consider this an enhancement request for Apex 4.1: Support Websheets in a runtime-only environment by:

  • Including application 4900 in the runtime installation
  • Including the wwv_flow_f4000_plugins package in the runtime installation
  • Add a parameter to the ADD_WORKSPACE and ADD_SCHEMA procedures to specify whether websheets should be enabled or not

Tuesday, March 1, 2011

Stress Testing Oracle XE 10g

Oracle Express Edition (XE) is Oracle's free entry-level database product, currently available only in a 10g version. XE is usually pitched as suitable for personal work or (very) small departmental applications, but I was curious as to what kind of load it can support.

Oracle XE 10g has the following limitations:

  • Up to 1 instance per server
  • Up to 1 CPU (will not use more even if available)
  • Up to 1 GB RAM (will not use more even if available)
  • Up to 4 GB datafiles (not including XE system data)
  • Free to develop, distribute and deploy in production


By the way, several things seem to indicate that Oracle Express Edition (XE) 11g is just around the corner. It's rumored that XE 11g will raise the datafile limit from 4 GB to 11 GB, but the other limits will remain as far as I know (and I don't really know anything about it...!). So these performance tests will probably be representative both for XE 10g and 11g, although I can't know for sure until 11g is available.

Here is the test environment I set up:

Hardware

Note that the "hardware" in this case actually runs virtualized in a VMWare environment.
Here are the resources allocated to the server:

  • 1 Intel Xeon X7350 2,9 GHz CPU
  • 4 GB RAM (but remember that XE won't use more than 1 GB anyway)
  • 30 GB disk space


Software

The software setup was as follows:

  • Windows Server 2003 R2 Standard Edition Service Pack 2
  • Microsoft Internet Information Server (IIS) 6.0
  • Thoth Gateway
  • Oracle Express Edition (XE) 10g
  • Oracle Application Express (Apex) 4.0


Note that I am not using the Embedded PL/SQL Gateway (DBMS_EPG) as the web server, but rather IIS in combination with the Thoth Gateway, a free ASP.NET replacement for mod_plsql and DBMS_EPG that I wrote in C# using ODP.NET.

Everything (database, web server, Apex) was installed using default settings.
The Apex images folder (/i/) was set up with "Expires"-headers (7 days) to allow browsers to cache images.

Both the database and the web server run on the same machine (server), in order to keep the setup as simple as possible (and show what is possible with just one box).

Test page

I set up an Apex test page with a mix of regions, including a report region, a couple of PL/SQL regions and an HTML region. For each page view, a random number is generated (to simulate different users looking at different things). The report query selects up to 200 rows based on this random number (using bind variables, of course), out of a system table/view (DBA_OBJECTS) with around 17,000 total rows.



There is also an after footer page process that inserts a row in a log table, so this is not just a read-only page.

The application was set up with an authorization scheme of "No application authorization scheme required" to allow it to be accessed from the online stress test tool.

Stress test tool

LoadImpact is a user-friendly, web-based stress testing service that allows you to run free tests that simulate up to 50 concurrent clients (you can pay to test with more clients). A key point here is that the load is actually generated from their test servers on the public internet, so this gives a more realistic test than simply running a stress testing tool on your local network.

You don't even need to sign up, just enter the URL of a website to test it (apparently, there are safeguards to prevent this from being used as a denial-of-service attack tool).

A note regarding image caching: The LoadImpact service does not cache anything. This means that all the Apex images, Javascript and stylesheets will be loaded on every page hit, even though in reality they would be cached in the user's browser after the first page view. Since I wanted to test the database's ability to handle page views, rather than IIS's ability to serve up static files, I changed the Apex page so that the standard Javascript and CSS files were not included. I then manually added the core stylesheets (core Apex + theme) back on the page via the page template, just to preserve the look and layout of the page. But even these could have been left out in order to simulate a scenario where most users have the files cached in their browser.

Test results

The free test at LoadImpact runs five different subtests, starting with 10 concurrent clients and ending with 50 concurrent clients. The full test takes 10 minutes, so that is in five 2-minute parts with increasing load.

While this was going on, I watched the CPU usage in the Windows Task Manager and grabbed the following screenshots:





While there are some peaks at the start of each subtest, it seems like the database "warms up" and actually does less work at the middle and end of the subtest compared to the start. Even at 40 and 50 clients, the CPU is (on average) not working at more than half capacity.

Here is the summary from LoadImpact at the end of the test:



First of all, note the nice and almost flat curve, with response times of under 300 ms even with 50 clients (and remember that the load generator is actually located in a different country, accessing the Apex page via the public Internet).

Actually, while the test was running, I was also clicking through a separate Apex application (an actual application, not just a test page) on the same server, and I couldn't really notice that the server was under any kind of stress; the response times were excellent.

The total number of requests for the whole test was 4500, but these requests include not only the actual page (the call to the "f" procedure) but also 2 stylesheets and 3 images. So the database was hit 1 in 6 times, in other words 16,7% (of 4500) or 750 times. Each hit inserted a row into a log table. If we query that log table and group by down to the second, we can get the number of database requests per second:



This shows that the XE database was handling 10 or 11 page views per second. If this can be sustained over time, that is 36,000 page views per hour, or 864,000 page views per day.

By comparison, apex.oracle.com is reported to have 4,800,000 page views per week or 685,000 page views per day. And according to this, Reddit has 10 pageviews per second per server (with 80 servers), while Facebook has just 2 pageviews per second per server (with 30,000 servers!).

I won't read too much into these numbers (as obviously a "page view" can vary wildly between different websites, and only actual use can tell whether your site is performing well or not), but clearly the free XE database is more than a toy.

I'm even tempted to buy a Basic or Professional test at LoadImpact to test with more than 50 clients and see where that curve leads with 60, 80 or 100 concurrent clients...