More Radius Search Fun

In my previous post, I outlined the process which I'm using to speed up radius searches over a large pool of potential matches. With that I was able to answer the question "How many within n miles?" More recently I've been asked to answer the question "What is the minimum radius which contains n records?"

Example Radius Plot

Example Radius Plot

So, let's say that the red dot represents 2 records and the green dot represents 3 records. If I need to find the minimum radius to contain 4 records, then I'll need to go out to the green dot to satisfy that requirement. I think it's time I introduced my friendly query:

WITH available_records_by_zip as (
      select zip, count(1) locations
      from records
      inner join addresses a using(address_id)
      group by zip
)
SELECT origin_zip, radius
  FROM (
    SELECT
      origin_zip,
      radius,
      cnt,
      row_number() over(PARTITION BY origin_zip ORDER BY radius) rnk
    FROM (
      SELECT
          zr.zip_from origin_zip,
        radius,
        SUM(available_records_by_zip .locations) over( --This is the analytic function
          PARTITION BY  zr.zip_from                    --which makes this possible.
          ORDER BY radius
        ) cnt
      FROM
      zip_radius_map zr,
      available_records_by_zip
      where zr.zip_from=:search_zip
      and zr.zip_to= available_records_by_zip.zip
    )
    WHERE cnt >= 4 -- Number of locations needed
  )
WHERE rnk = 1

This is what it does: Given a pool or records, group then by origin zip and order them by distance. Only pull those which meet the minimum requirement by summing their count with all of the counts before them. Order that set by radius and take the first one. That guy is your minimum requirement.

In the real world, I'm joining to a table of target zips, but you get the idea. Doing it this way(cheating), I can do a single analysis for a few thousand target zips in a handful of seconds. The time it takes really just depends on the size of your set of available providers. These Oracle analytics functions are very powerful once you understand them.

What I’ve Learned as an Accidental DBA

Working for a small company has it's advantages and disadvantages.  One of the biggest disadvantages has to be the lack of specialist employees.  I wear many hats at work.  Most days I'm a software developer.  Some days I get to be a network administrator.  When a server is down, then I'm the systems administrator. One day I was the janitor.  Since I'm big and tall, I'm frequently mistaken for a furniture mover.

It's a blessing and a curse.  There is never a dull moment and always something for me to do.  I've acquired many skills out of necessity.  One of those skills has been database administration.  More specifically, Oracle administration.  I've learned to hate the database server.  When something goes wrong in that box of mystery, everything grinds to a halt and people start getting pissed.  So, in no particular order, I'd like to share some nuggets of information I've learned over the years.

  • Cheap RAID cards are a bitch.  I've had the opportunity to witness the carnage that can happen when good RAID goes bad.
  • A fast disk is a fast database.  Unless you can keep your entire database in memory, your data is going to need to be read from disk.  Disk IO is a major bottleneck.
  • RAID is awesome when it works.  What's faster than one fast disk?  More fast disks and a fast RAID controller.  More is better. Just make sure you invest in quality equipment.  Good RAID controllers aren't cheap for a reason.  The expensive ones don't corrupt your data.
  • RAID 10 is better than RAID 5.  The only exception is if you never need to write data with any speed.  Disks are so cheap that it doesn't make any sense to try and save the money.  RAID 10 is just that much faster.
  • Indexes are a double edged sword.  They are necessary for fast queries, but too many will slow inserts and updates.  Also, indexes on large tables should be chosen carefully as the indexes can consume quite a bit of disk.
  • A bad query can bring the database to its knees.
  • Locks protect multiple requests from stomping on top of each other.  They are also good at stopping inserts and updates from happening when one gets stuck.
  • Keeping the statistics up to date makes sure that the database choses an optimal execution plan for retrieving your data.  The database needs to know how your data looks. 
  • Full table scans aren't always bad, but most of the time they are.  The only time they aren't is generally when you are querying a majority of the data in the table.
  • Be careful where you store your LOBs. Storing LOBs that you don't always need access to inline with other data can be slow.  It reduces row density per data block which means more disk IO is needed to get your data.
  • There are about 42 different ways to query the same data.  Most of them are terrible.

Obviously you should take this information for what it is.  This is just a few things I've come to believe based upon 5 years of banging my head against a wall.  I write software, and somehow I just inherited this database responsibility.  I'm not sure how, I think I was just closer to the server when it died the first time.  

Most of the problems I've had with databases have been hardware related. A good backup strategy goes a long way to keeping your ass out of the fire.  It's a terrible feeling when you lose your own data.  It's an even worse feeling when you lose someone else's data.  If you are an accidental DBA, I feel your pain.  We should all start a support group.

Code Dump: NPI PL/SQL Validation

I just want to throw this out there since it's useful. I searched and didn't come up with an Oracle specific implementation of this algorithm, so here it is. This function checks if a given National Provider Identifier(NPI) is valid. The Luhn Algorithm is used to validate NPI after a prefix of "80840" is applied.

This code is provided as is. I'm not making any guarantees that it's 100% right and I'm not responsible if anything bad happens as a result of your use of this. So, don't yell at me if small animals are harmed with the use of this code.

--Returns 1 if valid, 0 if invalid
CREATE OR REPLACE FUNCTION valid_npi (npi in varchar)
RETURN number
IS
    tot NUMBER := 24; --80840 prefix for npi
    val NUMBER := 0;
BEGIN
  if(
    npi is null or
    length(npi) <> 10 or
    LENGTH(TRIM(TRANSLATE(npi, '0123456789', ' ')))>0 --not numeric
  ) then
    return 0;
  end if;

  for i IN reverse 1 .. LENGTH(npi) loop
    val:=SUBSTR(npi, i, 1);
    if mod(i,2)=1 then
      val := val * 2;
      if(val > 9) then
        val:=val-9;
      end if;
    end if;

    tot := tot + val;
  end loop;

  if(MOD(tot, 10)=0) then
    return 1;
  else
    return 0;
  end if;
END valid_npi;

Random Oracle Tip: Duplicating a Table Structure

I think this is well known, but since I use it from time to time, I'm going to repeat it. If you need to duplicate a table's structure for some reason, then all you need to do is this:

CREATE TABLE new_table as (select * from old_table WHERE 1=0)

This will copy the table structure without any of the data since one can never equal zero.

Oracle 9i NULL Behavior

This trips me up from time to time, so I wanted to document this for my own benefit and share my findings. Oracle treats NULL different from any other value. Any time you make a direct comparison to a NULL value, the end result is NULL.

This topic is best shown by example.

SELECT *
FROM records
WHERE NULL=NULL;

Returns nothing.

SELECT *
FROM records
WHERE ''='';

Returns nothing. Oracle treats an empty string (‘’) as NULL.

SELECT *
FROM records
WHERE NULL IS NULL;

Returns everything. In order to do a comparison with NULL, you must use “is”.

SELECT *
FROM records
WHERE '' IS NULL;

Also returns everything. This demonstrates Oracle’s behavior of the empty string (‘’) being treated as NULL.

Now it's time for the odd stuff. Whenever you are making list comparisons, you should also be careful of NULL. List comparisons are IN/NOT IN and EXISTS/NOT EXISTS.

SELECT *
FROM records
WHERE unique_id NOT IN (
   SELECT unique_id
   FROM some_other_table
   WHERE 1=0
);

Returns nothing even though you would expect it to. I’m using 1=0 because it is never true. The set will be empty(NULL). The inner query must return something in order for the query to proceed.

SELECT *
FROM records r
WHERE NOT EXISTS (
   SELECT 1
   FROM some_other_table
   WHERE unique_id=r.unique_id
   AND 1=0
);

Returns everything. Exists will evaluate to true if something is returned and will evaluate false if nothing(NULL) is returned. Also, please note that in the inner query I’m referencing a value from the outer query(r.unique_id).

SELECT *
FROM records
WHERE unique_id NOT IN (
   SELECT NULL AS c
   FROM dual
);

Returns nothing. The inner query doesn’t return an empty set, instead it returns a single value which is null. This query will take a while to run. It will scan every record and try to perform the following comparison “unique_id <> NULL”, which will fail for all records for reasons stated in the top set of examples.

SELECT *
FROM records
WHERE unique_id IN (
   SELECT NULL FROM dual
   UNION ALL
   SELECT 11697981 FROM dual --Let us assume this value actually exists in the records table.
);

Returns 1 record. The inner query returns 2 values (NULL, 11697981). Just because the list has a value in it that is null doesn’t mean the entire comparison will fail for that record.

That's all I've got on the NULL situation. Generally I get tripped up on a NOT IN clause or a <> comparison. Hopefully not anymore.