工作中有遇到这样的要求: ------1、随机提取每个县市区各3000个号码作为样本,全市10个县市区共30000个号码。
------2、样本号码提取尽量离散。 以前没有做过类似的数据抽取,网上找来一些资料,整理总结如下:
A)随机函数—dbms_random a)基础 关于这些函数及DBMS_RANDOM包的文件都包含在SQLPlus中: [sql] view plain copy
print?
- select text from all_source
- where name = 'DBMS_RANDOM' and type = 'PACKAGE'
- order by line;
也可以查看包体: [sql] view plain copy
print?
- select text from all_source
- where name = 'DBMS_RANDOM' and type = 'PACKAGE BODY'
- order by line;
源码说明一切(10g的): [sql] view plain copy
print?
- PACKAGE dbms_random AS
-
- ------------
- -- OVERVIEW
- --
- -- This package should be installed as SYS. It generates a sequence of
- -- random 38-digit Oracle numbers. The expected length of the sequence
- -- is about power(10,28), which is hopefully long enough.
- ----------
-
- --USAGE
- --
- -- This is a random number generator. Do not use for cryptography.
- -- For more options the cryptographic toolkit should be used.
- --
- -- By default, the package is initialized with the current user
- -- name, current time down to the second, and the current session.
- --
- -- If this package is seeded twice with the same seed, then accessed
- -- in the same way, it will produce the same results in both cases.
- --
- --------
-
- -- EXAMPLES
- --
- -- To initialize or reset the generator, call the seed procedure as in:
- -- execute dbms_random.seed(12345678);
- -- or
- -- execute dbms_random.seed(TO_CHAR(SYSDATE,'MM-DD-YYYY HH24:MI:SS'));
- -- To get the random number, simply call the function, e.g.
- -- my_random_number BINARY_INTEGER;
- -- my_random_number := dbms_random.random;
- -- or
- -- my_random_real NUMBER;
- -- my_random_real := dbms_random.value;
- -- To use in SQL statements:
- -- select dbms_random.value from dual;
- -- insert into a values (dbms_random.value);
- -- variable x NUMBER;
- -- execute :x := dbms_random.value;
- -- update a set a2=a2+1 where a1 < :x;
-
- -- Seed with a binary integer
- PROCEDURE seed(val IN BINARY_INTEGER);
- PRAGMA restrict_references(seed, WNDS);
-
- -- Seed with a string (up to length 2000)
- PROCEDURE seed(val IN VARCHAR2);
- PRAGMA restrict_references(seed, WNDS);
-
- -- Get a random 38-digit precision number, 0.0 <= value < 1.0
- FUNCTION value RETURN NUMBER;
- PRAGMA restrict_references(value, WNDS);
-
- -- get a random Oracle number x, low <= x < high
- FUNCTION value(low IN NUMBER, high IN NUMBER) RETURN NUMBER;
- PRAGMA restrict_references(value, WNDS);
-
- -- get a random number from a normal distribution
- FUNCTION normal RETURN NUMBER;
- PRAGMA restrict_references(normal, WNDS);
-
- -- get a random string
- FUNCTION string(opt char, len NUMBER)
-
- /* "opt" specifies that the returned string may contain:
- 'u','U' : upper case alpha characters only
- 'l','L' : lower case alpha characters only
- 'a','A' : alpha characters only (mixed case)
- 'x','X' : any alpha-numeric characters (upper)
- 'p','P' : any printable characters
- */
-
- RETURN VARCHAR2; -- string of <len> characters
- PRAGMA restrict_references(string, WNDS);
-
- -- Obsolete, just calls seed(val)
- PROCEDURE initialize(val IN BINARY_INTEGER);
- PRAGMA restrict_references(initialize, WNDS);
-
- -- Obsolete, get integer in ( -power(2,31) <= random < power(2,31) )
- FUNCTION random RETURN BINARY_INTEGER;
- PRAGMA restrict_references(random, WNDS);
-
- -- Obsolete, does nothing
- PROCEDURE terminate;
- TYPE num_array IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
-
- END dbms_random;
b)例子: [sql] view plain copy
print?
- select dbms_random.value from dual;
-
- select dbms_random.value() from dual;
默认取0.0-1.0的数 [sql] view plain copy
print?
- select dbms_random.random from dual;
-
- select dbms_random.random() from dual;
取整数 [sql] view plain copy
print?
- select abs(mod(dbms_random.random,100)) from dual;------余数法
- select trunc(dbms_random.value(0,100)) from dual;-----整数法
取某个范围内的整数
c)进阶 缺省DBMS_RANDOM.VALUE返回0到1之间的随机数 NORMAL函数返回服从正态分布的一组数。此正态分布标准偏差为1,期望值为0。这个函数返回的数值中有68%是介于-1与+1之间,95%介于-2与+2之间,99%介于-3与+3之间。
STRING函数。它返回一个长度达60个字符的随机字符串。
B)用DBMS_RANDOM生成文本和日期值
数字、文本字符串和日期都是用户会在表格里碰到的三种常见数据类型。虽然你可以用PL/SQL程序包里的DBMS_RANDOM随机生成数字——它确实能够做到这一点——它还能够随机生成文本和日期值。
a)产生随机数字
就让我们先从数字开始。VALUE函数会返回一个大于等于0但是小于1的数,精度是38位。 [sql] view plain copy
print?
- SELECT DBMS_RANDOM.VALUE FROM DUAL;
对于指定范围内的整数,要加入参数low_value和high_value,并从结果中截取小数(最大值不能被作为可能的值)。所以对于0到99之间的整数,你要使用下面的代码: [sql] view plain copy
print?
- SELECT TRUNC(DBMS_RANDOM.VALUE(0, 100)) FROM DUAL;
b)产生随机文本字符串
要随机生成文本字符串,就要使用STRING函数并编写代码指定字符串的类型和所希望的长度: [sql] view plain copy
print?
- SELECT DBMS_RANDOM.STRING('A', 20) FROM DUAL;
类型代码在《Oracle Database 10g PL/SQL程序包和类型参考(Oracle Database 10g PL/SQL Packages and Types Reference)》有说明。
下面是一些类型的代码:
‘U’用来生成大写字符
‘L’用来生成小写字符
‘A’用来生成大小写混合的字符 c)产生随机日期
Oracle将日期作为过去某个关键日期(如果你好奇的话,我可以告诉你这个日期是公元前4712年1月1日)的整数偏移量来保存。这就意味着你可以通过寻找与你希望的起始日期相对应的整数,然后向它加入一个随机的整数来随机生成一个指定范围内的日期。
使用TO_CHAR函数和‘J’格式代码,你可以为今天的日期生成一个内部日期数: [sql] view plain copy
print?
- SELECT TO_CHAR(SYSDATE, 'J') FROM DUAL;
例如,要生成一个2003年内的任意日期,你可以首先确定2003年1月1日的日期整数; [sql] view plain copy
print?
- SELECT TO_CHAR(TO_DATE('01/01/03','mm/dd/yy'),'J')FROM DUAL;
系统给的结果是2452641。所以要生成该年度内的任意日期,我们就要用带有low_value等于2452641和high_value等于2452641+364参数的DBMS_RANDOM.VALUE,再把它转换成日期: [sql] view plain copy
print?
- SELECT TO_DATE(TRUNC(DBMS_RANDOM.VALUE(2452641,2452641+364)),'J') FROM DUAL;
C)Oracle取随机数据
1、Oracle访问数据的基本方法:
1)、全表扫描(Full table Scan):执行全表扫描,Oracle读表中的所有记录,考查每一行是否满足WHERE条件。Oracle顺序的读分配给该表的每一个数据块,且每个数据块Oracle只读一次.这样全表扫描能够受益于多块读.
2)、采样表扫描(sample table scan):扫描返回表中随机采样数据,这种访问方式需要在FROM语句中包含SAMPLE选项或者SAMPLE BLOCK选项.
注:从Oracle8i开始Oracle提供采样表扫描特性
2、使用sample获得随机结果集
2.1、语法: SAMPLE [ BLOCK ](sample_percent)[ SEED (seed_value) ]
SAMPLE选项:表示按行采样来执行一个全表扫描,Oracle从表中读取特定百分比的记录,并判断是否满足WHERE子句以返回结果。
BLOCK: 表示使用随机块例举而不是随机行例举。
sample_percent:是随机获取一张表中记录的百分比。比如值为10,那就是表中的随机的百分之10的记录。
值必须大于等于.000001,小于100。
SEED:表示从哪条记录返回,类似于预先设定例举结果,因而每次返回的结果都是固定的。该值必须介于0和4294967295之间。
2.2、举例说明
创建测试临时表: [sql] view plain copy
print?
- create table zeeno as select * from dba_objects;
1)、sample(sample_percent): [sql] view plain copy
print?
- -- 从表zeeno中“全表扫描”随机抽取10%的记录,随机查询5条记录
- SQL>select object_name from zeeno sample(10) where rownum<6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- UET$
- VIEW$
- I_SUPEROBJ2
- TRIGGERCOL$
- I_VIEW1
-
- SQL> /
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- I_FILE1
- IND$
- CLU$
- FET$
- I_COBJ#
-
2)、sample block(sample_percent) [sql] view plain copy
print?
- -- 从表zeeno中“采样表扫描”随机抽取10%的记录,随机查询5条记录
- SQL> select object_name from zeeno sample block(10) where rownum<6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- URIFACTORY
- DBMS_XMLGEN
- DBMS_XMLGEN
- DBMS_XMLSTORE
- DBMS_XMLSTORE
-
3)、sample block(sample_percent) seed(seed_value) [sql] view plain copy
print?
- -- 使用seed,返回固定的结果集。从表zeeno中“采样表扫描”随机抽取10%的记录,随机查询5条记录。
- SQL> select object_name from zeeno sample(10) seed(10) where rownum<6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- UET$
- I_CON1
- I_FILE2
- FET$
- I_COL1
-
- SQL> select object_name from zeeno sample(10) seed(10) where rownum<6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- UET$
- I_CON1
- I_FILE2
- FET$
- I_COL1
-
注意以下几点: 1.sample只对单表生效,不能用于表连接和远程表
2.sample会使SQL自动使用CBO
3、使用DBMS_RANDOM包
DBMS_RANDOM有两种主要的使用方法分别是:DBMS_RANDOM.VALUE()和DBMS_RANDOM.RANDOM
3.1、取随机数 [sql] view plain copy
print?
- SQL> select dbms_random.value() from dual;
-
- DBMS_RANDOM.VALUE()
- -------------------
- 0.146123095968043
-
- SQL> select dbms_random.value() from dual;
-
- DBMS_RANDOM.VALUE()
- -------------------
- 0.90175764902345
[sql] view plain copy
print?
- SQL> select dbms_random.value(1,10) from dual;
-
- DBMS_RANDOM.VALUE(1,10)
- -----------------------
- 9.86601968210438
-
- SQL> select dbms_random.value(1,10) from dual;
-
- DBMS_RANDOM.VALUE(1,10)
- -----------------------
- 3.43475105499398
3.2、举例说明 [sql] view plain copy
print?
- SQL> select * from (select object_name from zeeno order by dbms_random.random) where rownum<6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- /6dd0fe0e_CertificateCertifica
- /cf5224d7_SunJSSE_a4
- KU$_PARSED_ITEMS
- javax/swing/text/IconView
- oracle/xml/jdwp/XSLJDWPString
-
- SQL> select * from (select object_name from zeeno order by dbms_random.random) where rownum<6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- java/io/ObjectOutputStream$1
- sun/security/krb5/KrbAsReq
- /2d52a21c_Last
- SYS_YOID0000006594$
- /308fbfa1_BeanContextServices
[sql] view plain copy
print?
- SQL> select * from (select object_name from zeeno order by trunc(dbms_random.value(1,3))) where rownum<6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- ICOL$
- C_COBJ#
- PROXY_ROLE_DATA$
- I_OBJ#
- UET$
-
- SQL> select * from (select object_name from zeeno order by trunc(dbms_random.value(1,3))) where rownum<6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- ICOL$
- UNDO$
- I_PROXY_ROLE_DATA$_1
- I_CDEF2
- UET$
[sql] view plain copy
print?
- SQL> select trunc(dbms_random.value(0, 1000)) randomNum from dual; --(0-1000的整数)
-
- RANDOMNUM
- ----------
- 790
-
- SQL> select dbms_random.value(0, 1000) randomNum from dual; --(0-1000的浮点数)
-
- RANDOMNUM
- ----------
- 997.876726
4、使用内部函数sys_guid() [sql] view plain copy
print?
- SQL> select * from (select OBJECT_NAME from zeeno order by sys_guid()) where rownum < 6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- /6bedadd5_KeyManagerFactory1
- /ffd795c8_AddCRIF
- TABLE_EXPORT_OBJECTS
- /278cd3a4_CGParselet
- KU$_REFCOL_T
-
- SQL> select * from (select OBJECT_NAME from zeeno order by sys_guid()) where rownum < 6;
-
- OBJECT_NAME
- --------------------------------------------------------------------------------
- sun/awt/InputMethodSupport
- V_$RESTORE_POINT
- COLORSLIST
- java/util/WeakHashMap$Entry
- DBMSOUTPUT_LINESARRAY
注: 在使用sys_guid() 这种方法时,有时会获取到相同的记录,即和前一次查询的结果集是一样的,查找相关资料,有些说是和操作系统有关,在windows平台下正常,获取到的数据是随机的,而在linux等平台下始终是相同不变的数据集,有些说是因为sys_guid()函数本身的问题,即sys_guid()会在查询上生成一个16字节的全局唯一标识符,这个标识符在绝大部分平台上由一个宿主标识符和进程或进程的线程标识符组成,这就是说,它很可能是随机的,但是并不表示一定是百分之百的这样。
所以,为确保在不同的平台每次读取的数据都是随机的,我们大多采用使用sample函数或者DBMS_RANDOM包获得随机结果集,其中使用sample函数更常用,因为其查询时缩小了查询范围,在查询大表,且要提取数据不是很不多的情况下,会对查询速度上有明显的提高。
D)其他数据库随机取出n条记录: 1、SqlServer中随机提取数据库记录 select top n * from 表 order by newid() --------------------------------------------------------------------------------
select top 10 * from tablename order by NEWID()
select top 10 * from tablename order by NEWID()
2、MySQL中随机提取数据库记录 Select * From 表 order By rand() Limit n -------------------------------------------------------------------------------
select * from tablename order by rand() limit 10
select * from tablename order by rand() limit 10
3、Access中随机提取数据库记录 Select top n * FROM 表 orDER BY Rnd(id) -------------------------------------------------------------------------------
SELECT top 10 * FROM tablename ORDER BY Rnd(FId)
SELECT top 10 * FROM tablename ORDER BY Rnd(FId)
FId:为你当前表的ID字段名
*************************************************end*************************************************
|