1. Computer problem? Tech Support Guy is completely free -- paid for by advertisers and donations. Click here to join today! If you're new to Tech Support Guy, we highly recommend that you visit our Guide for New Members.

need some SQL help

Discussion in 'Software Development' started by SabreWolf3, Oct 9, 2003.

Thread Status:
Not open for further replies.
Advertisement
  1. SabreWolf3

    SabreWolf3 Thread Starter

    Joined:
    Nov 6, 2000
    Messages:
    227
    Hi all,

    I know BASIC SQL, which is usually all I need to get through my day. However, I have run across a report that needs to be generated that is a bit more complex.

    I would like to learn how to do this, rather than asking my resident SQL guru, because he's not a very nice guy. I am hoping someone here can help.

    I have a database table that has five columns. See image below. The columns are Date, Page_ID, Space_ID, Group_ID and Clicks.

    Here is what I want to do:
    • Only return DISTINCT Page_ID's from the table
    • For each Page_ID, I want a MAXIMUM value of clicks returned for each Date/Page_ID/Space_ID/Group_ID combination.
    • After finding these MAXIMUM values, I want all the maximum values SUMMED for each Page_ID.

    In other words, for the example, my output would be:
    Page_ID ===== Clicks
    123 ===== 2240

    You would have 6 Maximum values:
    10/01/2003 - 123 - 10 - 1 = 420
    10/01/2003 - 123 - 20 - 17 = 200
    10/01/2003 - 123 - 30 - 25 = 500
    10/02/2003 - 123 - 10 - 1 = 420
    10/02/2003 - 123 - 20 - 17 = 200
    10/02/2003 - 123 - 30 - 25 = 500

    The sum of all these values would be 2,240.

    Can anyone help me generate this SQL script?

    [​IMG]
     
  2. Rockn

    Rockn

    Joined:
    Jul 29, 2001
    Messages:
    21,334
    I think you need to do a nested query on the results you already have and then group on the page_id and then sum the max of the hits.
     
  3. SabreWolf3

    SabreWolf3 Thread Starter

    Joined:
    Nov 6, 2000
    Messages:
    227
    Hi Rockn,

    Well, I don't already have any results... the image in the post is just an example that I made up. I need to have a query formulated, because I am turning this report into an automated utility that can be refreshed by the user with the click of a button.

    The reason I need the query to pull distinct page_ids, do the calculation, and roll up to the page_id, is because there are over 100,000 page_id's in this table. If you don't pull distinct values, there will be at least 3 to 4 rows of data per page_id, which is beyond anything that Excel or Access can handle.

    The report will use a date range as criteria, and the date range will be limited to 31 days at a time. This will ensure that there will never be more than 65,000 rows of data, and it can be handled by Excel.

    I'm not sure I understand what a nested query is.
     
  4. Rockn

    Rockn

    Joined:
    Jul 29, 2001
    Messages:
    21,334
    You first run a query to return all of the maximum values for a given Page_ID and a specified date range. Once those values are returned you run another query based on the first to return a sum of all the maximum values for that Page_ID. I did it with Access quite easily last night on a client database. You have any sample data I could play with to show you?
     
  5. Gibble

    Gibble

    Joined:
    Oct 9, 2001
    Messages:
    27,087
    This should work:

    PHP:
    SELECT Page_IDsum(Max_ClicksFROM (
      
    SELECT Date
             
    Page_ID
             
    Space_ID
             
    Group_ID
             
    max(Clicks) AS Max_Clicks 
      FROM 
    --TABLENAME-- 
      
    GROUP BY Date
             
    Page_ID
             
    Space_ID
             
    Group_ID
      WHERE Date 
    some_Date 
        
    AND Date some_Date
    GROUP BY Page_ID
     
  6. SabreWolf3

    SabreWolf3 Thread Starter

    Joined:
    Nov 6, 2000
    Messages:
    227
    Hi Gibble,

    I had to modify your code a little for the syntax requirements of our database, but it seems to work very well.

    One question though. I had downloaded the data for September and already ran scripts to format it and calculate all my totals manually.

    So now I'm going back and using this new query to pull down results and compare them. I found a couple right off the bat where the new query's results were a little bit higher. After checking into the details, I found a few instances where it is listing duplicate values for a Date/Page_ID/Space_ID/Group_ID combination. How do I make sure that this doesn't happen? I just assumed that the GROUP BY clause would automatically roll these up into one row.

    Thanks,
    Richard
     
  7. Gibble

    Gibble

    Joined:
    Oct 9, 2001
    Messages:
    27,087
    You'll have to give me an example...I don't quit know what data is being duplicated, where, or what exactly the problem with the result is...
     
  8. Rockn

    Rockn

    Joined:
    Jul 29, 2001
    Messages:
    21,334
    You are getting duplicates because of the date range. If the database is collecing multiple instances of hits for the same page on the same date then the numbers will be wrong. Does the databae only collect the number of clicks and add one to the page_ID for a given day or does it collect all clicks individually?
     
  9. SabreWolf3

    SabreWolf3 Thread Starter

    Joined:
    Nov 6, 2000
    Messages:
    227
    Hi Gibble,

    Using the example I posted above, I should receive a total for the Page_ID (123) of 2,240 clicks. That's after the query groups the data by Date, Page_ID,Space_ID and Group_ID and gives the Max(Clicks) for each group. Then it SUMs these values to get the 2,240.

    Well, using data that I had already downloaded for September and I know is correct, I ran this query with the criteria for September and the total came back too high.

    So I just ran the subquery to see how the data was grouped before it SUMs the values and returns a total. When I ran the subquery, I see duplicate lines that are identical, which I thought the GROUP BY clause would eliminate.

    For example, using data from above... this is what I should see:

    10/01/2003 - 123 - 10 - 1 = 420
    10/01/2003 - 123 - 20 - 17 = 200
    10/01/2003 - 123 - 30 - 25 = 500
    10/02/2003 - 123 - 10 - 1 = 420
    10/02/2003 - 123 - 20 - 17 = 200
    10/02/2003 - 123 - 30 - 25 = 500

    What I am actually seeing is this:

    10/01/2003 - 123 - 10 - 1 = 420
    10/01/2003 - 123 - 20 - 17 = 200
    10/01/2003 - 123 - 20 - 17 = 200
    10/01/2003 - 123 - 30 - 25 = 500
    10/02/2003 - 123 - 10 - 1 = 420
    10/02/2003 - 123 - 10 - 1 = 420
    10/02/2003 - 123 - 20 - 17 = 200
    10/02/2003 - 123 - 30 - 25 = 500


    Notice how the 2nd & 3rd lines are duplicates? Also, the 5th & 6th lines as well. When the SUM is performed on these values, it returns 2,860... which is wrong.

    How do I make sure that when the data is grouped, that there are no duplicate values? If it groups by those 4 fields (Date, Page_ID, Space_ID, Group_ID), how can there be two rows exactly alike?

    Thanks,
    Richard
     
  10. Gibble

    Gibble

    Joined:
    Oct 9, 2001
    Messages:
    27,087
    Change the subquery to a SELECT DISTINCT, that shoudl get rid of the duplicates

    and now that I think about it, that makes sense...
     
  11. SabreWolf3

    SabreWolf3 Thread Starter

    Joined:
    Nov 6, 2000
    Messages:
    227
    Hi Rockn and Gibble...

    Actually, after further inspection... the query is working perfectly. The rows that I thought were duplicates in fact were not. The reporting tool that I was using to QA my data rolls up the clicks to a Group_Name instead of a Group_ID. What we found is that the same Group_Name can exist on the same page, but its Group_ID would be different. Since the reporting tool rolls up the lines to the Group_Name it was combining rows that it shouldn't have been, and the data from the query was perfect!

    Thanks again for all the help on that front. Now I have another question.

    I usually automate these queries by putting them into MSQuery, which allows you to set up your main criteria as parameters. I can then embed the query into an Excel spreadsheet and build a userform to collect the criteria, such as Page_ID and Start_Date and End_Date from the user. Then the query would just pull in that criteria from the anchored cells in my spreadsheet.

    However, since this uses a subquery, MSQuery cannot display the query graphically. Therefore, it tells me that I cannot use parameters.

    Can you guys think of any way for me to automate this query for my end users? For instance, is there a way to connect to the database directly from Excel?


    Thanks for all the help!
    -- Richard
     
  12. SabreWolf3

    SabreWolf3 Thread Starter

    Joined:
    Nov 6, 2000
    Messages:
    227
    Here I am back in this post again, with a follow-up question regarding the SQL statement.

    Can you tell me what's wrong with this query? Remember, this is formatted for MSQuery, and the RDBMS is IBM Redbrick 32.

    Code:
    SELECT DATE, BNAME, CNAME, PAGEID, PDESC, PURL, 
    Sum(IMPRESSIONS),Sum(CLICKS)
    
    FROM (SELECT USAGE_DAILY_0.CAL_DT as DATE,
    BRAND_0.BRAND_ID,
    BRAND_0.BRAND_NAME as BNAME,
    CHANNEL_0.CHANNEL_ID,
    CHANNEL_0.CHANNEL_NAME as CNAME,
    PAGE_0.PAGE_ID as PAGEID,
    PAGE_0.PAGE_DESC as PDESC,
    PAGE_0.PAGE_NAME as PURL,
    USAGE_DAILY_0.ADSPACE_POSITION_ID,
    USAGE_DAILY_0.CLUSTER_ID,
    Min(USAGE_DAILY_0.TOTAL_IMPRESSIONS) as IMPRESSIONS,
    Sum(USAGE_DAILY_0.TOTAL_RESPONSES) as CLICKS
    
    FROM USAGE_DAILY USAGE_DAILY_0, BRAND BRAND_0, CHANNEL CHANNEL_0, PAGE PAGE_0
    
    WHERE (BRAND_0.BRAND_ID = USAGE_DAILY_0.BRAND_ID) AND
    (CHANNEL_0.CHANNEL_ID = USAGE_DAILY_0.CHANNEL_ID) AND
    (PAGE_0.PAGE_ID = USAGE_DAILY_0.PAGE_ID) AND
    (USAGE_DAILY_0.CAL_DT >= {d '2003-09-01'}) AND
    (USAGE_DAILY_0.CAL_DT <= {d '2003-09-30'}) AND
    (USAGE_DAILY_0.BRAND_ID = 3) AND
    (USAGE_DAILY_0.CHANNEL_ID = 49)
    
    GROUP BY USAGE_DAILY_0.CAL_DT, 
    USAGE_DAILY_0.BRAND_ID, 
    BRAND_0.BRAND_NAME, 
    USAGE_DAILY_0.CHANNEL_ID, 
    CHANNEL_0.CHANNEL_NAME, 
    USAGE_DAILY_0.PAGE_ID, 
    PAGE_0.PAGE_DESC, 
    PAGE_0.PAGE_NAME, 
    USAGE_DAILY_0.ADSPACE_POSITION_ID, 
    USAGE_DAILY_0.CLUSTER_ID) as sum_min
    
    GROUP BY DATE, BNAME, CNAME, PDESC, PAGEID
    ORDER BY DATE, BNAME, CNAME, PDESC, PAGEID
    
    The error I am receiving is this:
    "Could not add the table '(SELECT'.

    This follows the exact same syntax as the query that I finally worked out to solve the problem I originally posted about. The only difference is that this query pulls in fields from other tables, but this syntax is the one MSQuery usually installs to do such.

    Thanks for the help!
    Richard
     
  13. Gibble

    Gibble

    Joined:
    Oct 9, 2001
    Messages:
    27,087
    I don't know if that database will support subselects like I used them. But I'm not sure as I am not familiar with IBM Redbrick
     
  14. SabreWolf3

    SabreWolf3 Thread Starter

    Joined:
    Nov 6, 2000
    Messages:
    227
    Yes, I know is support subselects, because I used your suggestions above to formulate a simple query. This query works and follows the same syntax.

    Code:
    SELECT PAGEID, Sum(IMPRESSIONS)
    FROM (SELECT USAGE_DAILY_0.CAL_DT,
    USAGE_DAILY_0.BRAND_ID,
    USAGE_DAILY_0.CHANNEL_ID,
    USAGE_DAILY_0.PAGE_ID as PAGEID,
    USAGE_DAILY_0.ADSPACE_POSITION_ID,
    USAGE_DAILY_0.CLUSTER_ID,
    Min(USAGE_DAILY_0.TOTAL_IMPRESSIONS) as IMPRESSIONS
     
    FROM USAGE_DAILY USAGE_DAILY_0 
    
    WHERE (USAGE_DAILY_0.CAL_DT>={d '2003-09-01'}) AND 
                   (USAGE_DAILY_0.CAL_DT<={d '2003-10-12'}) AND 
                   (USAGE_DAILY_0.BRAND_ID=3) AND 
                   (USAGE_DAILY_0.CHANNEL_ID=23151865)
     
    GROUP BY USAGE_DAILY_0.CAL_DT, 
    USAGE_DAILY_0.BRAND_ID,
    USAGE_DAILY_0.CHANNEL_ID, 
    USAGE_DAILY_0.PAGE_ID,
    USAGE_DAILY_0.ADSPACE_POSITION_ID,
    USAGE_DAILY_0.CLUSTER_ID) as sum_min 
    GROUP BY PAGEID 
    ORDER BY PAGEID
    
    The only difference is that this query pulls all of its data from a single table (usage_daily_0), and only uses ID fields.

    The query I am trying to make work, uses those ID fields to pull in descriptions from the various lookup tables.

    Table = Page_0 has page_name and page_desc for page_id
    Table = Brand_0 has brand_name for brand_id
    Table = Channel_0 has channel_name for channel_id
     
  15. Gibble

    Gibble

    Joined:
    Oct 9, 2001
    Messages:
    27,087
    If you run just the subselect on it's own are their no errors in it and does it return values. I'm asking because I think DATE could be a reserved word and the db not like you using it as a column alias.
     
  16. Sponsor

As Seen On
As Seen On...

Welcome to Tech Support Guy!

Are you looking for the solution to your computer problem? Join our site today to ask your question. This site is completely free -- paid for by advertisers and donations.

If you're not already familiar with forums, watch our Welcome Guide to get started.

Join over 733,556 other people just like you!

Loading...
Thread Status:
Not open for further replies.

Short URL to this thread: https://techguy.org/170827

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice