下面我们对于时区进行计数。一种是使用标准库,另一种是使用panda。计数的方法是在遍历时区的过程中将计数值保存在字典中:
import json
path = './Python Data Analysis Example_1.txt'
records = [json.loads(line) for line in open(path)]
#print(records[0]['tz'])
time_zones = [rec['tz'] for rec in records if 'tz' in rec]
#print(time_zones[:10])
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
counts = get_counts(time_zones)
print(counts['America/New_York'])
print(len(time_zones))
输出结果:
1251
3440
如果想得到前十位的时区以及其技术值,则将上面的程序补充:
import json
path = './Python Data Analysis Example_1.txt'
records = [json.loads(line) for line in open(path)]
#print(records[0]['tz'])
time_zones = [rec['tz'] for rec in records if 'tz' in rec]
#print(time_zones[:10])
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
def top_counts(count_dict, n=10):
value_key_pairs = [(count, tz) for tz, count in count_dict.items()]
value_key_pairs.sort()
return value_key_pairs[-n:]
counts = get_counts(time_zones)
print(counts['America/New_York'])
print(len(time_zones))
print(top_counts(counts))
输出结果:
1251
3440
[(33, 'America/Sao_Paulo'), (35, 'Europe/Madrid'), (36, 'Pacific/Honolulu'), (37, 'Asia/Tokyo'), (74, 'Europe/London'), (191, 'America/Denver'), (382, 'America/Los_Angeles'), (400, 'America/Chicago'), (521, ''), (1251, 'America/New_York')]
我们可一使用Python标准库中的collection.Counter,程序如下:
from collections import Counter
counts = Counter(time_zones)
print(counts.most_common(10))
输出结果:
[('America/New_York', 1251), ('', 521), ('America/Chicago', 400), ('America/Los_Angeles', 382), ('America/Denver', 191), ('Europe/London', 74), ('Asia/Tokyo', 37), ('Pacific/Honolulu', 36), ('Europe/Madrid', 35), ('America/Sao_Paulo', 33)]
NEXT。。。


-
- 0000000000000000
-
1888 发帖7917 回复34980 积分
- 私信他 +关注
-
- xiaomiking
-
1147 发帖6357 回复18609 积分
- 私信他 +关注
块
导
航
举报
请选择举报类别
- 广告垃圾
- 违规内容
- 恶意灌水
- 重复发帖