python-用嵌套的defaultdict重建数组

该问题是先前一个问题的扩展:rebuild python array based on common elements
 -但又有所不同,足以提出一个新问题:

我已经为此苦了一段时间.我的数据是来自sql查询的字典数组.数组中的每个元素都代表一个货件,并且基于键具有共同的值.

data = [
    {"CustName":"customer1", "PartNum":"part1", "delKey":"0001", "qty":"10", "memo":"blah1"},
    {"CustName":"customer1", "PartNum":"part1", "delKey":"0002", "qty":"10", "memo":"blah2"},
    {"CustName":"customer1", "PartNum":"part1", "delKey":"0003", "qty":"10", "memo":"blah3"},
    {"CustName":"customer2", "PartNum":"part3", "delKey":"0004", "qty":"20", "memo":"blah4"},
    {"CustName":"customer2", "PartNum":"part3", "delKey":"0005", "qty":"20", "memo":"blah5"},
    {"CustName":"customer3", "PartNum":"partXYZ", "delKey":"0006", "qty":"50", "memo":"blah6"},
    {"CustName":"customer3", "PartNum":"partABC", "delKey":"0007", "qty":"100", "memo":"blah7"}]

我想要的输出根据特定键分组

dataOut = [
   {"CustName":"customer1", "Parts":[
        {"PartNum":"part1", "deliveries":[
            {"delKey":"0001", "qty":"10", "memo":"blah1"},
            {"delKey":"0002", "qty":"10", "memo":"blah2"},
            {"delKey":"0003", "qty":"10", "memo":"blah3"}]}]},
   {"CustName":"customer2", "Parts":[
        {"PartNum":"part3", "deliveries":[
            {"delKey":"0004", "qty":"20", "memo":"blah4"},
            {"delKey":"0005", "qty":"20", "memo":"blah5"}]}]},
   {"CustName":"customer3", "Parts":[
        {"PartNum":"partXYZ", "deliveries":[
            {"delKey":"0006", "qty":"50", "memo":"blah6"}]},
        {"PartNum":"partABC", "deliveries":[
            {"delKey":"0007", "qty":"100", "memo":"blah7"}]}]}]

我可以使用上一个问题提供的defaultdict和list comprehension进行单个级别的分组,并稍加修改

d = defaultdict(list)
for item in data:
    d[item['CustName']].append(item)
print([{'CustName': key, 'parts': value} for key, value in d.items()])

但是我似乎无法获得输出数组中的第二级-PartNum键的分组.通过一些研究,我认为我需要做的是使用defaultdict作为外部`defaultdict’的类型,如下所示:

d = defaultdict(defaultdict(list))

这会引发错误,因为defaultdict返回了一个函数,所以我需要使用lambda(是吗?)

d = defaultdict(lambda:defaultdict(list))
for item in data:
    d[item['CustName']].append(item) <----this?

我的问题是如何“访问”循环中的第二级数组,并告诉“内部” defaultdict对(PartNum)进行分组?数据来自数据库程序员,并且项目不断发展以添加越来越多的数据(键),因此我希望这种解决方案尽可能通用,以防丢掉更多数据.我希望能够根据需要执行的级别“链接”默认值.我正在学习,所以我一直在努力了解lambda和defaultdict类型的基础知识以及从何而来.

解决方法:

使用@Pynchia建议的groupby并使用@hege_hegedus建议的对无序数据进行排序:

from itertools import groupby
dataOut = []
dataSorted = sorted(data, key=lambda x: (x["CustName"], x["PartNum"]))
for cust_name, cust_group in groupby(dataSorted, lambda x: x["CustName"]):
    dataOut.append({
        "CustName": cust_name,
        "Parts": [],
    })
    for part_num, part_group in groupby(cust_group, lambda x: x["PartNum"]):
        dataOut[-1]["Parts"].append({
            "PartNum": part_num,
            "deliveries": [{
                "delKey": delivery["delKey"],
                "memo": delivery["memo"],
                "qty": delivery["qty"],
            } for delivery in part_group]
        })

如果您查看第二个for循环,这将有望回答您有关在循环中访问第二级数组的问题.

上一篇:Collections 初识


下一篇:python 中的defaultdict 用法