need some SQL help

Status
This thread has been Locked and is not open to further replies. Please start a New Thread if you're having a similar issue. View our Welcome Guide to learn how to use this site.

SabreWolf3

Thread Starter
Joined
Nov 6, 2000
Messages
227
Hi all,

I know BASIC SQL, which is usually all I need to get through my day. However, I have run across a report that needs to be generated that is a bit more complex.

I would like to learn how to do this, rather than asking my resident SQL guru, because he's not a very nice guy. I am hoping someone here can help.

I have a database table that has five columns. See image below. The columns are Date, Page_ID, Space_ID, Group_ID and Clicks.

Here is what I want to do:
• Only return DISTINCT Page_ID's from the table
• For each Page_ID, I want a MAXIMUM value of clicks returned for each Date/Page_ID/Space_ID/Group_ID combination.
• After finding these MAXIMUM values, I want all the maximum values SUMMED for each Page_ID.

In other words, for the example, my output would be:
Page_ID ===== Clicks
123 ===== 2240

You would have 6 Maximum values:
10/01/2003 - 123 - 10 - 1 = 420
10/01/2003 - 123 - 20 - 17 = 200
10/01/2003 - 123 - 30 - 25 = 500
10/02/2003 - 123 - 10 - 1 = 420
10/02/2003 - 123 - 20 - 17 = 200
10/02/2003 - 123 - 30 - 25 = 500

The sum of all these values would be 2,240.

Can anyone help me generate this SQL script?

 
Joined
Jul 29, 2001
Messages
21,334
I think you need to do a nested query on the results you already have and then group on the page_id and then sum the max of the hits.
 

SabreWolf3

Thread Starter
Joined
Nov 6, 2000
Messages
227
Hi Rockn,

Well, I don't already have any results... the image in the post is just an example that I made up. I need to have a query formulated, because I am turning this report into an automated utility that can be refreshed by the user with the click of a button.

The reason I need the query to pull distinct page_ids, do the calculation, and roll up to the page_id, is because there are over 100,000 page_id's in this table. If you don't pull distinct values, there will be at least 3 to 4 rows of data per page_id, which is beyond anything that Excel or Access can handle.

The report will use a date range as criteria, and the date range will be limited to 31 days at a time. This will ensure that there will never be more than 65,000 rows of data, and it can be handled by Excel.

I'm not sure I understand what a nested query is.
 
Joined
Jul 29, 2001
Messages
21,334
You first run a query to return all of the maximum values for a given Page_ID and a specified date range. Once those values are returned you run another query based on the first to return a sum of all the maximum values for that Page_ID. I did it with Access quite easily last night on a client database. You have any sample data I could play with to show you?
 
Joined
Oct 9, 2001
Messages
27,087
This should work:

PHP:
SELECT Page_ID, sum(Max_Clicks) FROM (
  SELECT Date, 
         Page_ID, 
         Space_ID, 
         Group_ID, 
         max(Clicks) AS Max_Clicks 
  FROM --TABLENAME-- 
  GROUP BY Date, 
         Page_ID, 
         Space_ID, 
         Group_ID
  WHERE Date > some_Date 
    AND Date < some_Date
) GROUP BY Page_ID
 

SabreWolf3

Thread Starter
Joined
Nov 6, 2000
Messages
227
Hi Gibble,

I had to modify your code a little for the syntax requirements of our database, but it seems to work very well.

One question though. I had downloaded the data for September and already ran scripts to format it and calculate all my totals manually.

So now I'm going back and using this new query to pull down results and compare them. I found a couple right off the bat where the new query's results were a little bit higher. After checking into the details, I found a few instances where it is listing duplicate values for a Date/Page_ID/Space_ID/Group_ID combination. How do I make sure that this doesn't happen? I just assumed that the GROUP BY clause would automatically roll these up into one row.

Thanks,
Richard
 
Joined
Oct 9, 2001
Messages
27,087
You'll have to give me an example...I don't quit know what data is being duplicated, where, or what exactly the problem with the result is...
 
Joined
Jul 29, 2001
Messages
21,334
You are getting duplicates because of the date range. If the database is collecing multiple instances of hits for the same page on the same date then the numbers will be wrong. Does the databae only collect the number of clicks and add one to the page_ID for a given day or does it collect all clicks individually?
 

SabreWolf3

Thread Starter
Joined
Nov 6, 2000
Messages
227
Hi Gibble,

Using the example I posted above, I should receive a total for the Page_ID (123) of 2,240 clicks. That's after the query groups the data by Date, Page_ID,Space_ID and Group_ID and gives the Max(Clicks) for each group. Then it SUMs these values to get the 2,240.

Well, using data that I had already downloaded for September and I know is correct, I ran this query with the criteria for September and the total came back too high.

So I just ran the subquery to see how the data was grouped before it SUMs the values and returns a total. When I ran the subquery, I see duplicate lines that are identical, which I thought the GROUP BY clause would eliminate.

For example, using data from above... this is what I should see:

10/01/2003 - 123 - 10 - 1 = 420
10/01/2003 - 123 - 20 - 17 = 200
10/01/2003 - 123 - 30 - 25 = 500
10/02/2003 - 123 - 10 - 1 = 420
10/02/2003 - 123 - 20 - 17 = 200
10/02/2003 - 123 - 30 - 25 = 500

What I am actually seeing is this:

10/01/2003 - 123 - 10 - 1 = 420
10/01/2003 - 123 - 20 - 17 = 200
10/01/2003 - 123 - 20 - 17 = 200
10/01/2003 - 123 - 30 - 25 = 500
10/02/2003 - 123 - 10 - 1 = 420
10/02/2003 - 123 - 10 - 1 = 420
10/02/2003 - 123 - 20 - 17 = 200
10/02/2003 - 123 - 30 - 25 = 500


Notice how the 2nd & 3rd lines are duplicates? Also, the 5th & 6th lines as well. When the SUM is performed on these values, it returns 2,860... which is wrong.

How do I make sure that when the data is grouped, that there are no duplicate values? If it groups by those 4 fields (Date, Page_ID, Space_ID, Group_ID), how can there be two rows exactly alike?

Thanks,
Richard
 
Joined
Oct 9, 2001
Messages
27,087
Change the subquery to a SELECT DISTINCT, that shoudl get rid of the duplicates

and now that I think about it, that makes sense...
 

SabreWolf3

Thread Starter
Joined
Nov 6, 2000
Messages
227
Hi Rockn and Gibble...

Actually, after further inspection... the query is working perfectly. The rows that I thought were duplicates in fact were not. The reporting tool that I was using to QA my data rolls up the clicks to a Group_Name instead of a Group_ID. What we found is that the same Group_Name can exist on the same page, but its Group_ID would be different. Since the reporting tool rolls up the lines to the Group_Name it was combining rows that it shouldn't have been, and the data from the query was perfect!

Thanks again for all the help on that front. Now I have another question.

I usually automate these queries by putting them into MSQuery, which allows you to set up your main criteria as parameters. I can then embed the query into an Excel spreadsheet and build a userform to collect the criteria, such as Page_ID and Start_Date and End_Date from the user. Then the query would just pull in that criteria from the anchored cells in my spreadsheet.

However, since this uses a subquery, MSQuery cannot display the query graphically. Therefore, it tells me that I cannot use parameters.

Can you guys think of any way for me to automate this query for my end users? For instance, is there a way to connect to the database directly from Excel?


Thanks for all the help!
-- Richard
 

SabreWolf3

Thread Starter
Joined
Nov 6, 2000
Messages
227
Here I am back in this post again, with a follow-up question regarding the SQL statement.

Can you tell me what's wrong with this query? Remember, this is formatted for MSQuery, and the RDBMS is IBM Redbrick 32.

Code:
SELECT DATE, BNAME, CNAME, PAGEID, PDESC, PURL, 
Sum(IMPRESSIONS),Sum(CLICKS)

FROM (SELECT USAGE_DAILY_0.CAL_DT as DATE,
BRAND_0.BRAND_ID,
BRAND_0.BRAND_NAME as BNAME,
CHANNEL_0.CHANNEL_ID,
CHANNEL_0.CHANNEL_NAME as CNAME,
PAGE_0.PAGE_ID as PAGEID,
PAGE_0.PAGE_DESC as PDESC,
PAGE_0.PAGE_NAME as PURL,
USAGE_DAILY_0.ADSPACE_POSITION_ID,
USAGE_DAILY_0.CLUSTER_ID,
Min(USAGE_DAILY_0.TOTAL_IMPRESSIONS) as IMPRESSIONS,
Sum(USAGE_DAILY_0.TOTAL_RESPONSES) as CLICKS

FROM USAGE_DAILY USAGE_DAILY_0, BRAND BRAND_0, CHANNEL CHANNEL_0, PAGE PAGE_0

WHERE (BRAND_0.BRAND_ID = USAGE_DAILY_0.BRAND_ID) AND
(CHANNEL_0.CHANNEL_ID = USAGE_DAILY_0.CHANNEL_ID) AND
(PAGE_0.PAGE_ID = USAGE_DAILY_0.PAGE_ID) AND
(USAGE_DAILY_0.CAL_DT >= {d '2003-09-01'}) AND
(USAGE_DAILY_0.CAL_DT <= {d '2003-09-30'}) AND
(USAGE_DAILY_0.BRAND_ID = 3) AND
(USAGE_DAILY_0.CHANNEL_ID = 49)

GROUP BY USAGE_DAILY_0.CAL_DT, 
USAGE_DAILY_0.BRAND_ID, 
BRAND_0.BRAND_NAME, 
USAGE_DAILY_0.CHANNEL_ID, 
CHANNEL_0.CHANNEL_NAME, 
USAGE_DAILY_0.PAGE_ID, 
PAGE_0.PAGE_DESC, 
PAGE_0.PAGE_NAME, 
USAGE_DAILY_0.ADSPACE_POSITION_ID, 
USAGE_DAILY_0.CLUSTER_ID) as sum_min

GROUP BY DATE, BNAME, CNAME, PDESC, PAGEID
ORDER BY DATE, BNAME, CNAME, PDESC, PAGEID
The error I am receiving is this:
"Could not add the table '(SELECT'.

This follows the exact same syntax as the query that I finally worked out to solve the problem I originally posted about. The only difference is that this query pulls in fields from other tables, but this syntax is the one MSQuery usually installs to do such.

Thanks for the help!
Richard
 
Joined
Oct 9, 2001
Messages
27,087
I don't know if that database will support subselects like I used them. But I'm not sure as I am not familiar with IBM Redbrick
 

SabreWolf3

Thread Starter
Joined
Nov 6, 2000
Messages
227
Yes, I know is support subselects, because I used your suggestions above to formulate a simple query. This query works and follows the same syntax.

Code:
SELECT PAGEID, Sum(IMPRESSIONS)
FROM (SELECT USAGE_DAILY_0.CAL_DT,
USAGE_DAILY_0.BRAND_ID,
USAGE_DAILY_0.CHANNEL_ID,
USAGE_DAILY_0.PAGE_ID as PAGEID,
USAGE_DAILY_0.ADSPACE_POSITION_ID,
USAGE_DAILY_0.CLUSTER_ID,
Min(USAGE_DAILY_0.TOTAL_IMPRESSIONS) as IMPRESSIONS
 
FROM USAGE_DAILY USAGE_DAILY_0 

WHERE (USAGE_DAILY_0.CAL_DT>={d '2003-09-01'}) AND 
               (USAGE_DAILY_0.CAL_DT<={d '2003-10-12'}) AND 
               (USAGE_DAILY_0.BRAND_ID=3) AND 
               (USAGE_DAILY_0.CHANNEL_ID=23151865)
 
GROUP BY USAGE_DAILY_0.CAL_DT, 
USAGE_DAILY_0.BRAND_ID,
USAGE_DAILY_0.CHANNEL_ID, 
USAGE_DAILY_0.PAGE_ID,
USAGE_DAILY_0.ADSPACE_POSITION_ID,
USAGE_DAILY_0.CLUSTER_ID) as sum_min 
GROUP BY PAGEID 
ORDER BY PAGEID
The only difference is that this query pulls all of its data from a single table (usage_daily_0), and only uses ID fields.

The query I am trying to make work, uses those ID fields to pull in descriptions from the various lookup tables.

Table = Page_0 has page_name and page_desc for page_id
Table = Brand_0 has brand_name for brand_id
Table = Channel_0 has channel_name for channel_id
 
Joined
Oct 9, 2001
Messages
27,087
If you run just the subselect on it's own are their no errors in it and does it return values. I'm asking because I think DATE could be a reserved word and the db not like you using it as a column alias.
 
Status
This thread has been Locked and is not open to further replies. Please start a New Thread if you're having a similar issue. View our Welcome Guide to learn how to use this site.

Users Who Are Viewing This Thread (Users: 0, Guests: 1)

As Seen On
As Seen On...

Welcome to Tech Support Guy!

Are you looking for the solution to your computer problem? Join our site today to ask your question. This site is completely free -- paid for by advertisers and donations.

If you're not already familiar with forums, watch our Welcome Guide to get started.

Join over 807,865 other people just like you!

Latest posts

Members online

Top