【涉及知识点】 最值:单列最大max(column),单列最小min(column);多列最大greatest(column1,column2),多列最小least(column1,column2),注意greatest和least里不能有值为null否则会返回null 剔重:distinct 排序:按某column列从小到大order by column,从大到小order by column desc 选取:limit m offset n 取第n+1至n+1+(m-1)=n+m名 没有结果时返回null:嵌套一层,内层没有结果什么也不会返回,外层在内层没有结果的情况下会返回null,也可以外层用ifnull更直观
Write a SQL query to get the second highest salary from the Employee table.
+----+--------+ | Id | Salary | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | +----+--------+ For example, given the above Employee table, the query should return 200 as the second highest salary. If there is no second highest salary, then the query should return null.
The Employee table holds all employees. Every employee has an Id, a salary, and there is also a column for the department Id.
+----+-------+--------+--------------+ | Id | Name | Salary | DepartmentId | +----+-------+--------+--------------+ | 1 | Joe | 70000 | 1 | | 2 | Jim | 90000 | 1 | | 3 | Henry | 80000 | 2 | | 4 | Sam | 60000 | 2 | | 5 | Max | 90000 | 1 | +----+-------+--------+--------------+ The Department table holds all departments of the company.
+----+----------+ | Id | Name | +----+----------+ | 1 | IT | | 2 | Sales | +----+----------+ Write a SQL query to find employees who have the highest salary in each of the departments. For the above tables, your SQL query should return the following rows (order of rows does not matter).
+------------+----------+--------+ | Department | Employee | Salary | +------------+----------+--------+ | IT | Max | 90000 | | IT | Jim | 90000 | | Sales | Henry | 80000 | +------------+----------+--------+ Explanation:
Max and Jim both have the highest salary in the IT department and Henry has the highest salary in the Sales department.
select d.name as 'Department', e.name as 'Employee', e.Salary from employee e join department d on d.id=e.departmentid join (select max(salary) as mxsalary, departmentid from employee group by departmentid) mx on mx.mxsalary=e.salary and mx.departmentid=e.departmentid
The Employee table holds all employees. Every employee has an Id, and there is also a column for the department Id.
+----+-------+--------+--------------+ | Id | Name | Salary | DepartmentId | +----+-------+--------+--------------+ | 1 | Joe | 85000 | 1 | | 2 | Henry | 80000 | 2 | | 3 | Sam | 60000 | 2 | | 4 | Max | 90000 | 1 | | 5 | Janet | 69000 | 1 | | 6 | Randy | 85000 | 1 | | 7 | Will | 70000 | 1 | +----+-------+--------+--------------+ The Department table holds all departments of the company.
+----+----------+ | Id | Name | +----+----------+ | 1 | IT | | 2 | Sales | +----+----------+ Write a SQL query to find employees who earn the top three salaries in each of the department. For the above tables, your SQL query should return the following rows (order of rows does not matter).
+------------+----------+--------+ | Department | Employee | Salary | +------------+----------+--------+ | IT | Max | 90000 | | IT | Randy | 85000 | | IT | Joe | 85000 | | IT | Will | 70000 | | Sales | Henry | 80000 | | Sales | Sam | 60000 | +------------+----------+--------+ Explanation:
In IT department, Max earns the highest salary, both Randy and Joe earn the second highest salary, and Will earns the third highest salary. There are only two employees in the Sales department, Henry earns the highest salary while Sam earns the second highest salary.
select d.name as 'Department', e1.name as 'Employee', e1.Salary from employee e1 join department d on e1.departmentid=d.id join employee e2 on e1.departmentid=e2.departmentid where e1.salary<=e2.salary group by e1.id having count(distinct(e2.salary))<=3
注意: group by的是id,而不是departmentid,这是因为e1和e2已经按照同departmentid来比较了,group by主要是为了后面的having count(),而不是为了区分部门。后面可以用e1.id或者e1.name,但不能用e2的id或name,也不能用departmentid或者salary。
问题2 对比
【涉及知识点】 核心思路:先拆表再并表,把表按照一个参数一个表拆为多个表,再对比或用join连接各表进行数值对比 并表:select xxx from table1 join table2 join table3 on table1.column1=table2.column2 and table1.column3=table3.column4 拆表:select xxx from table as t1, table as t2 where
The Employee table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.
+----+-------+--------+-----------+ | Id | Name | Salary | ManagerId | +----+-------+--------+-----------+ | 1 | Joe | 70000 | 3 | | 2 | Henry | 80000 | 4 | | 3 | Sam | 60000 | NULL | | 4 | Max | 90000 | NULL | +----+-------+--------+-----------+ Given the Employee table, write a SQL query that finds out employees who earn more than their managers. For the above table, Joe is the only employee who earns more than his manager.
+----------+ | Employee | +----------+ | Joe | +----------+
X city built a new stadium, each day many people visit it and the stats are saved as these columns: id, visit_date, people
Please write a query to display the records which have 3 or more consecutive rows and the amount of people more than 100(inclusive).
For example, the table stadium: +------+------------+-----------+ | id | visit_date | people | +------+------------+-----------+ | 1 | 2017-01-01 | 10 | | 2 | 2017-01-02 | 109 | | 3 | 2017-01-03 | 150 | | 4 | 2017-01-04 | 99 | | 5 | 2017-01-05 | 145 | | 6 | 2017-01-06 | 1455 | | 7 | 2017-01-07 | 199 | | 8 | 2017-01-08 | 188 | +------+------------+-----------+ For the sample data above, the output is:
+------+------------+-----------+ | id | visit_date | people | +------+------------+-----------+ | 5 | 2017-01-05 | 145 | | 6 | 2017-01-06 | 1455 | | 7 | 2017-01-07 | 199 | | 8 | 2017-01-08 | 188 | +------+------------+-----------+ Note: Each day only have one row record, and the dates are increasing with id increasing.
select distinct s1.* from stadium s1, stadium s2, stadium s3 where s1.people>=100 and s2.people>=100 and s3.people>=100 and ((s1.id+1=s2.id and s2.id+1=s3.id) or (s2.id+1=s1.id and s1.id+1=s3.id) or (s3.id+1=s2.id and s2.id+1=s1.id)) order by id
Write a SQL query to find all duplicate emails in a table named Person.
+----+---------+ | Id | Email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ For example, your query should return the following for the above table:
+---------+ | Email | +---------+ | a@b.com | +---------+ Note: All emails are in lowercase.
There is a table courses with columns: student and class
Please list out all classes which have more than or equal to 5 students.
For example, the table:
+---------+------------+ | student | class | +---------+------------+ | A | Math | | B | English | | C | Math | | D | Biology | | E | Math | | F | Computer | | G | Math | | H | Math | | I | Math | +---------+------------+ Should output:
+---------+ | class | +---------+ | Math | +---------+
Note: The students should not be counted duplicate in each course.
X city opened a new cinema, many people would like to go to this cinema. The cinema also gives out a poster indicating the movies’ ratings and descriptions. Please write a SQL query to output movies with an odd numbered ID and a description that is not 'boring'. Order the result by rating.
For example, table cinema:
+---------+-----------+--------------+-----------+ | id | movie | description | rating | +---------+-----------+--------------+-----------+ | 1 | War | great 3D | 8.9 | | 2 | Science | fiction | 8.5 | | 3 | irish | boring | 6.2 | | 4 | Ice song | Fantacy | 8.6 | | 5 | House card| Interesting| 9.1 | +---------+-----------+--------------+-----------+ For the example above, the output should be: +---------+-----------+--------------+-----------+ | id | movie | description | rating | +---------+-----------+--------------+-----------+ | 5 | House card| Interesting| 9.1 | | 1 | War | great 3D | 8.9 | +---------+-----------+--------------+-----------+
【解析】 这道题非常简单,仅需知道余数的表达方式即可,余数用%则只有mysql支持,用mod()则其它如oracle等也支持,这道题的description里的boring是精确的,可以直接用=,但考虑到大部分时候描述和评价会是一小段话而非一个词,因此用了字符串模糊匹配的not like '%xxx%'
1 2 3 4 5
select * from cinema where id%2=1 and description not like '%boring%' order by rating desc
问题6 字符
【涉及知识点】 字符串匹配: 精确匹配用=,模糊匹配用like或not like, 模糊匹配不确定前后有多少字符的时候用'%word%',模糊匹配确定前后有多少个字符的时候用'w_rd', 模糊匹配多个可以包含的字符时用'where city like [ANL]%',即返回所有以A或N或L字母打头的城市名, 模糊匹配多个需要排除的字符时用'where city like [!ANL]',即返回所有不以A或N或L字母打头的城市名。 字符串长度: len(column),返回这列每个值的字符串长度 字符串定位: locate(abc, column),返回这列每个值中,字符abc出现的位置,如无返回0 截取字符串: mid(column, x, n),返回这列每个值中,从第x个字符开始一直截取n个字符的这一串子字符串
【例题】 问题5的例题262、问题11的例题620解析中有涉及到字符串处理,不赘述。
问题7 替换
【涉及知识点】 替换问题的核心其实是多条件赋值,如果只有较少条件的时候可以用if(),如果条件多时需要用case when if的基本用法:if(条件, 符合条件时的赋值1, 不符合条件时的赋值2) case when的基本用法:case when 条件1 then 赋值1 when 条件2 then 赋值2 else 赋值3 end
Given a table salary, such as the one below, that has m=male and f=female values. Swap all f and m values (i.e., change all f values to m and vice versa) with a single update statement and no intermediate temp table.
Note that you must write a single update statement, DO NOT write any select statement for this problem.
Example:
| id | name | sex | salary | |----|------|-----|--------| | 1 | A | m | 2500 | | 2 | B | f | 1500 | | 3 | C | m | 5500 | | 4 | D | f | 500 |
After running your update statement, the above salary table should have the following rows:
| id | name | sex | salary | |----|------|-----|--------| | 1 | A | f | 2500 | | 2 | B | m | 1500 | | 3 | C | f | 5500 | | 4 | D | m | 500 |
【解析】 解题思路有两种,一种是用if,因为这里只有两个赋值,一种是用更具普遍性的case when。 需要额外提到的是,这是一个更新数据库而非通常的提取数据用于分析的场景,需要用到update而非日常的select,更新数据update的基本用法是:update 表格名 set 需修改参数列名 = 修改内容
第一种思路,case when,具体如下:
1 2 3 4
update salary set sex = case sex when 'f'then'm' else'f' end
需要注意的是,修改的列和条件的列都是sex,如果不是同一列的话比如修改的是name,用于判断的条件是id,那么会写作update salary set name= case id when 1 then 'aaa' when 2 then 'bbb' else 'ccc' end 另,如果不写作case sex when,而是直接用case when,把sex放到后面去也是可以的,具体写作:
1 2 3 4 5
update salary set sex = case when sex='f' then'm' else'f' end
第二种思路,if,因为这里sex只有两个取值,所以用if是足够的,具体如下:
1 2
update salary set sex = if(sex='f','m','f')
需要注意的是,if后面的条件必须完整的写为sex=xxx,而不能因为前面已经有sex=了就省掉,因为这里的条件和要赋值的列都是sex,但可能存在不是同一个列的情况,比如按照上面的栗子会写作update salary set name = if(id=1,'aaa','bbb')
Mary is a teacher in a middle school and she has a table seat storing students' names and their corresponding seat ids. The column id is continuous increment. Mary wants to change seats for the adjacent students. Can you write a SQL query to output the result for Mary? +---------+---------+ | id | student | +---------+---------+ | 1 | Abbot | | 2 | Doris | | 3 | Emerson | | 4 | Green | | 5 | Jeames | +---------+---------+ For the sample input, the output is: +---------+---------+ | id | student | +---------+---------+ | 1 | Doris | | 2 | Abbot | | 3 | Green | | 4 | Emerson | | 5 | Jeames | +---------+---------+ Note: If the number of students is odd, there is no need to change the last one's seat.
select (case when id%2=0 then id-1 when id%2=1 and id<>maxid then id+1 else id end ) as id, student from seat, (select max(id) as maxid from seat) mx order by id
注意: 不能在case when的其中一个枝干里直接用case when id%2=1 and id not in (select max(id) from seat) mx then id+1,而须按照上面的写法把mx放到from之后,否则会报错。
问题8 扩表
【涉及知识点】 扩表是把记录不连续的表格扩大成包含一系列参数的大表,通常包含两类操作:列转行或行转列、对无记录的条目进行赋值 列转行通常有三种做法:if;case when;pivot 行转列通常做法为:union/union all if和case when的用法在问题7替换中已经讲过了,不赘述。 pivot的基本用法:
1 2 3
select column2value1,column2value2,column2value3 from (select * from originaltable) as base pivot(max/min/sum(column1) for column2 in (column2value1,column2value2,column2value3)) as pvt
union /union all基本用法:
1 2 3
select column from table union (all) select column from table
+---------------+---------+ | Column Name | Type | +---------------+---------+ | id | int | | revenue | int | | month | varchar | +---------------+---------+ (id, month) is the primary key of this table. The table has information about the revenue of each department per month. The month has values in ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"].
Write an SQL query to reformat the table such that there is a department id column and a revenue column for each month.
The query result format is in the following example:
Department table: +------+---------+-------+ | id | revenue | month | +------+---------+-------+ | 1 | 8000 | Jan | | 2 | 9000 | Jan | | 3 | 10000 | Feb | | 1 | 7000 | Feb | | 1 | 6000 | Mar | +------+---------+-------+
select distinct id, sum(CASE when month='jan'then revenue else null END) as 'Jan_Revenue', sum(CASE when month='feb'then revenue else null END) as 'Feb_Revenue', sum(CASE when month='mar'then revenue else null END) as 'Mar_Revenue', sum(CASE when month='apr'then revenue else null END) as 'Apr_Revenue', sum(CASE when month='may'then revenue else null END) as 'May_Revenue', sum(CASE when month='jun'then revenue else null END) as 'Jun_Revenue', sum(CASE when month='jul'then revenue else null END) as 'Jul_Revenue', sum(CASE when month='aug'then revenue else null END) as 'Aug_Revenue', sum(CASE when month='sep'then revenue else null END) as 'Sep_Revenue', sum(CASE when month='oct'then revenue else null END) as 'Oct_Revenue', sum(CASE when month='nov'then revenue else null END) as 'Nov_Revenue', sum(CASE when month='dec'then revenue else null END) as 'Dec_Revenue' from department group by id
备注: 1)需要注意的是用到了sum()和group by id,因为如果不用sum的话,就会出现每条记录对应的别的记录其实是有数值的会在这条记录中显示为null,因此需要用sum()或者max()来在多个取值中选择有数值的而非null,这道题目因为每个月每个部门只有一条记录,所以用max或者sum都可以,但如果是有多条记录的话,用sum()会更合适 2) 注意是group by id而不是group by month,因为已经把每个月给单独写出来了,不需要通过group by month来为不同月份分组计算,使用group by的目的是为了sum,因此需用id这种不会因group而汇总的参数。 3)distinct只是写作习惯,不写也可以,但实测发现写distinct的运算时间为690m,不写distinct会更慢运算时间要1116ms
第二种方法,if()函数,具体如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
select id, sum(if(month='jan', revenue, null)) as 'Jan_Revenue', sum(if(month='feb', revenue, null)) as 'Feb_Revenue', sum(if(month='mar', revenue, null)) as 'Mar_Revenue', sum(if(month='apr', revenue, null)) as 'Apr_Revenue', sum(if(month='may', revenue, null)) as 'May_Revenue', sum(if(month='jun', revenue, null)) as 'Jun_Revenue', sum(if(month='jul', revenue, null)) as 'Jul_Revenue', sum(if(month='aug', revenue, null)) as 'Aug_Revenue', sum(if(month='sep', revenue, null)) as 'Sep_Revenue', sum(if(month='oct', revenue, null)) as 'Oct_Revenue', sum(if(month='nov', revenue, null)) as 'Nov_Revenue', sum(if(month='dec', revenue, null)) as 'Dec_Revenue' from department group by id
备注: 1)仍是用到了sum()和group by 2)这种方法运算时间为734ms,很有趣的是如果加distinct,运算时间为774ms,相差不大
第三种方法,pivot,具体如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
select id, jan as Jan_revenue, feb as Feb_revenue, mar as Mar_revenue, apr as Apr_revenue, may as May_revenue, jun as Jun_revenue, jul as Jul_revenue, aug as Aug_revenue, sep as Sep_revenue, oct as Oct_revenue, nov as Nov_revenue, dec as Dec_revenue from (select id,revenue,month from department) as base pivot (sum(revenue) for month in(jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec)) as pvt
Write a SQL query to get the nth highest salary from the Employee table.
+----+--------+ | Id | Salary | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | +----+--------+ For example, given the above Employee table, the nth highest salary where n = 2 is 200. If there is no nth highest salary, then the query should return null.
【涉及知识点】 需要注意的是,这里所说的排名与排序不同,重点是需要为每行返回一个代表其名次的数字,这涉及到的核心功能仍是自定义函数,但不同于create function,这里的自定义函数是在查询语句中间实现的,也并不需要面向不同的输入参数。 通过自定义函数创建一列新的行序数字的基本格式是: select table.*, @rownum:=@rownum+1 from table, (select @rownum:=0) as r
Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no "holes" between ranks.
+----+-------+ | Id | Score | +----+-------+ | 1 | 3.50 | | 2 | 3.65 | | 3 | 4.00 | | 4 | 3.85 | | 5 | 4.00 | | 6 | 3.65 | +----+-------+ For example, given the above Scores table, your query should generate the following report (order by highest score):
select s1.Score, s3.Rank from scores s1 left join ( select s2.score, @rownum:=@rownum+1 as Rank from (select distinct score from scores order by score desc) s2, (select @rownum:=0) r ) s3 on s1.score=s3.score order by rank
The Trips table holds all taxi trips. Each trip has a unique Id, while Client_Id and Driver_Id are both foreign keys to the Users_Id at the Users table. Status is an ENUM type of (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’).
+----------+--------+--------+ | Users_Id | Banned | Role | +----------+--------+--------+ | 1 | No | client | | 2 | Yes | client | | 3 | No | client | | 4 | No | client | | 10 | No | driver | | 11 | No | driver | | 12 | No | driver | | 13 | No | driver | +----------+--------+--------+ Write a SQL query to find the cancellation rate of requests made by unbanned users (both client and driver must be unbanned) between Oct 1, 2013 and Oct 3, 2013. The cancellation rate is computed by dividing the number of canceled (by client or driver) requests made by unbanned users by the total number of requests made by unbanned users.
For the above tables, your SQL query should return the following rows with the cancellation rate being rounded to two decimal places.
select distinct t.request_at as 'Day', round(sum(case when t.status like '%cancelled%'then 1 else 0 end)/count(t.id), 2) as 'Cancellation Rate' from trips t join users u1 join users u2 on t.client_id=u1.users_id and t.driver_id=u2.users_id where u1.banned='no'and u2.banned='no' and t.request_at>='2013-10-01' and t.request_at<='2013-10-03' group by request_at order by request_at
注意: 1)like前面没有is,后面的模糊字符串需要用''框起来。 2)三个表可以一起连,直接join join on,而无需join on join on 3)日期可以直接用大于小于这样的符号,但需注意,日期也需用''框起来,否则会报错。
数据库维护相关
问题1 删重
【涉及知识点】 删除某行的格式:delete * from table where,和select的格式是一样的,只是换成delete,且因为是删整行,所以一般是*或者table.*而不像select需要写明选出来的是行里的哪些列参数。 删重还是剔重:一般原则上是不要删数据,用的时候选出不重复的就行了,也即是用distinct,但不排除有时确有删除数据库中的冗余数据的维护数据库而非从数据库中调用数据进行分析的情况。
Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id.
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | | 3 | john@example.com | +----+------------------+ Id is the primary key column for this table. For example, after running your query, the above Person table should have the following rows: